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AN ITERATIVE AND REGENERATIVE DNA SEQUENCING METHOD 

Background of the Invention 

Analysis of DNA with currently available techniques provides a spectrum of 
5 information ranging from the confirmation that a test DNA is the same or different than 
a standard sequence or an isolated fragment, to the express identification and ordering of 
each nucleotide of the test DNA. Not only are such techniques crucial for understanding 
the function and control of genes and for applying many of the basic techniques of 
molecular biology, but they have also become increasingly important as tools in 
10 genomic analysis and a great many non-research applications, such as genetic 

identification, forensic analysis, genetic counseling, medical diagnostics and many 
others. In these latter applications, both techniques providing partial sequence 
information, such as fingerprinting and sequence comparisons, and techniques providing 
full sequence determination have been employed (Gibbs et al., Proc. Natl Acad. Sci 
15 USA 1989; 86:1919-1923; Gyllensten et al., Proc. Nail Acad Sci USA 1988; 85:7652- 
7656; Carrano et al., Genomics 1989; 4:129-136; Caetano-Anolles et al., Mol Gen. 
Genet 1992; 235:157-165; Brenner and Livak, Proc. Natl. Acad. Sci USA 1989; 
86:8902-8906; Green et al., PCR Methods and Applications 1991; 1:77-90; and 
Versalovicetal.,^wc/e/c^c/rf/{w. 1991; 19:6823-6831). 
20 DNA sequencing methods currently available require the generation of a set of 

DNA fragments that are ordered by length according to nucleotide composition. The 
generation of this set of ordered fragments occurs in one of two ways: chemical 
degradation at specific nucleotides using the Maxam Gilbert method (Maxam AM and 
W Gilbert, Proc Natl Acad Sci USA 1977; 74:560-564) or dideoxy nucleotide 
25 incorporation using the Sanger method (Sanger F, S Nicklen, and AR Coulson, Proc 
Natl Acad Sci USA 1977; 74:5463-5467) so that the type and number of required steps 
inherently limits both the number of DNA segments that can be sequenced in parallel, 
and the number of operations which may be carried out in sequence. Furthermore, both 
methods are prone to error due to the anomalous migration of DNA fragments in 
30 denaturing gels. Time and space limitations inherent in these gel-based methods have 
fueled the search for alternative methods. 
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Several methods are under development that are designed to sequence DNA in a 
solid state format without a gel resolution step. The method that has generated the most 
interest is sequencing by hybridization. In sequencing by hybridization, the DNA 
sequence is read by determining the overlaps between the sequences of hybridized 
5 oligonucleotides. This strategy is possible because a long sequence can be deduced by 
matching up distinctive overlaps between its constituent oligomers (Strezoska Z, T 
Paunesku, D Radosavljevic, I Labat, R Drmanac, R Crkvenjakov, Proc Natl Acad Sci 
USA 1991; 88:10089-10093; Drmanac R, S Drmanac, Z Strezoska, T Paunesku, I Labat, 
M Zeremski, J Snoddy, WK Funkhouser, B Koop, L Hood, R Crkvenjakov, Science 
10 1 993 ; 260: 1 649- 1 652). This method uses hybridization conditions for oligonucleotide 
probes that distinguish between complete complementarity with the target sequence and 
a single nucleotide mismatch, and does not require resolution of fragments on 
polyacrylamide gels (Jacobs KA, R Rudersdorf, SD Neill, JP Dougherty, EL Brown, and 
EF Yritsch, Nucleic Acids Res. 1988; 16:4637-4650). Recent versions of sequencing by 
1 5 hybridization add a DNA ligation step in order to increase the ability of this method to 
discriminate between mismatches, and to decrease the length of the oligonucleotides 
necessary to sequence a given length of DNA (Broude NE, T Sano, CL Smith, CR 
Cantor, Proc. Natl. Acad Sci. USA 1994;91:3072-3076, Drmanac RT, International 
Business Communications, Southborough, MA). Significant obstacles with this method 
20 are its inability to accurately position repetitive sequences in DNA fragments, inhibition 
of probe annealing by the formation of internal duplexes in the DNA fragments, and the 
influence of nearest neighbor nucleotides within and adjacent to an annealing domain on 
the melting temperature for hybridization (Riccelli PV, AS Benight, Nucleic Acids Res 
1993;21:3785-3788, Williams JC, SC Case-Green, KU Mir, EM Southern. Nucleic 
25 Acids Res 1994;22:1365-1367). Furthermore, sequencing by hybridization cannot 
determine the length of tandem short repeats, which are associated with several human 
genetic diseases (Warren ST, Science 1996; 271:1374-1375). These limitations have 
prevented its use as a primary sequencing method. 

The base addition DNA sequencing scheme uses fluorescently labeled reversible 
30 terminators of polymerase extension, with a distinct and removable fluorescent label for 
each of the four nucleotide analogs (Metzker ML, Raghavachari R, Richards S, Jacutin 
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SE, Civitello A, Burgess K and RA Gibbs, Nucleic Acids Res. 1994; 22:4259-4267; 
Canard B and RS Sarfati, Gene 1994; 148:1-6). Incorporation of one of these base 
analogs into the growing primer strand allows identification of the incorporated 
nucleotide by its fluorescent label. This is followed by removal of the 
5 protecting/fluorescent group, creating a new substrate for template-directed polymerase 
extension. Iteration of these steps is designed to permit sequencing of a multitude of 
templates in a solid state format. Technical obstacles include a relatively low efficiency 
of extension and deprotection, and interference with primer extension caused by single- 
strand DNA secondary structure. A fundamental limitation to this approach is inherent 
1 0 in iterative methods that sequence consecutive nucleotides. That is, in order to sequence 
more than a handful nucleotides, each cycle of analog incorporation and deprotection 
must approach 100% efficiency. Even if the base addition sequencing scheme is refined 
so that each cycle occurs at 95% efficiency, one will have < 75% of the product of 
interest after only 6 cycles (0.95 6 = 0.735). This will severely limit the ability of this 
1 5 method to sequence anything but very short DNA sequences. Only one cycle of 
template-directed analog incorporation and deprotection appears to have been 
demonstrated so far (Metzker ML, Raghavachari R, Richards S, Jacutin SE, Civitello A, 
Burgess K and RA Gibbs, Nucleic Acids Res. 1994; 22:4259-4267; Canard B and RS 
Sarfati, Gene 1994; 148:1-6). A related earlier method, which is designed to sequence 
20 only one nucleotide per template, uses radiolabeled nucleotides or conventional non- 
reversible terminators attached to a variety of labels (Sokolov BP, Nucleic Acids 
Research 1989;18:3671; Kuppuswamy MN, JW Hoffman, CK Kasper, SG Spitzer, SL 
Groce, and SP Bajaj, Proc. NatlAcadScl USA 1991; 88:1143-1147). Recently, this 
method has been called solid-phase minisequencing (Syvanen AC, E Ikonen, T 
25 Manninen, M Bengstrom, H Soderlund, P Aula, and L Peltonen, Genomics 1 992; 

12:590-595; Kobayashi M, Rappaport E, Blasband A, Semeraro A, Sartore M, Surrey S, 
Fortina P., Molecular and Cellular Probes 1995; 9:175-182) or genetic bit analysis 
(Nikiforov TT, RB Rendle, P Goelet, YH Rogers, ML Kotewicz, S Anderson, GL 
Trainor, and MR Knapp, Nucleic Acids Research 1994; 22:4167-4175), and it has been 
30 used to verify the parentage of thoroughbred horses (Nikiforov TT, RB Rendle, P 
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Goelet, YH Rogers, ML Kotewicz, S Anderson, GL Trainor, and MR Knapp, Nucleic 
Acids Research 1 994; 22:4 1 67-4 1 75). 

An alternative method for DN A sequencing that remains in the development 
phase entails the use of flow cytometry to detect single molecules. In this method, one 
strand of a DNA molecule is synthesized using fluorescently labeled nucleotides, and the 
labeled DNA molecule is then digested by a processive exonuclease, with identification 
of the released nucleotides over real time using flow cytometry. Technical obstacles to 
the implementation of this method include the fidelity of incorporation of the 
fluorescently labeled nucleotides and turbulence created around the microbead to which 
the single molecule of DNA is attached (Davis LM, FR Fairfield, CA Harger, JH Jett, 
RA Keller, JH Hahn, LA Krakowski; BL Marrone, JC Martin, HL Nutter, RL Ratliff, 
EB Shera, DJ Simpson, SA Soper, Genetic Analysis, Techniques, and Applications 
1991; 8:1-7). Furthermore, this method is not amenable to sequencing numerous DNA 
segments in parallel. 

Another DNA sequencing method has recently been developed that uses class- 
IIS restriction endonuclease digestion and adaptor ligation to sequence at least some 
nucleotides offset from a terminal nucleotide. Using this method, four adjacent 
nucleotides have reportedly been sequenced and read following the gel resolution of 
DNA fragments. However, a limitation of this sequencing method is that it has built-in 
product losses, and requires many iterative cycles (International Application 
PCT/US95/03678). 

Another problem exists with currently available technologies in the area of 
diagnostic sequencing. An ever widening array of disorders, susceptibilities to 
disorders, prognoses of disease conditions, and the like, have been correlated with the 
presence of particular DNA sequences, or the degree of variation (or mutation) in DNA 
sequences, at one or more genetic loci. Examples of such phenomena include human 
leukocyte antigen (HLA) typing, cystic fibrosis, tumor progression and heterogeneity, 
p53 proto-oncogene mutations, and ras proto-oncogene mutations (Gullensten et al., 
PGR Methods and Applications, 1 :91-98 (1991); International application 
PCT/US92/01675; and International application PCT/CA90/00267). A difficulty in 
determining DNA sequences associated with such conditions to obtain diagnostic or 
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prognostic information is the frequent presence of multiple subpopulations of DNA, e.g., 
allelic variants, multiple mutant forms, and the like. Distinguishing the presence and 
identity of multiple sequences with current sequencing technology is impractical due to 
the amount of DNA sequencing required. 

5 

Summary of the Invention 

The present invention provides an alternative approach for sequencing DNA that 
does not require high resolution separations and that generates signals more amenable to 
analysis. The methods of the present invention can also be easily automated. This 

10 provides a means for readily analyzing DNA from many genetic loci. Furthermore, the 
DNA sequencing method of the present invention does not require the gel resolution of 
DNA fragments which allows for the simultaneous sequencing of cDNA or genomic 
DNA library inserts. Therefore, the full length transcribed sequences or genomes can be 
obtained very rapidly with the methods of the present invention. The method of the 

1 5 present invention further provides a means for the rapid sequencing of previously 

uncharacterized viral, bacterial or protozoan human pathogens, as well as the sequencing 
of plants and animals of interest to agriculture, conservation, and/or science. 

The present invention pertains to methods which can sequence multiple DNA 
segments in parallel, without running a gel. Each DNA sequence is determined without 

20 ambiguity, as this novel method sequences DNA in discrete intervals that start at one 
end of each DNA segment. The method of the present invention is carried out on DNA 
that is almost entirely double-stranded, thus preventing the formation of secondary 
structures that complicate the known sequencing methods that rely on hybridization to 
single-stranded templates (e.g., sequencing by hybridization), and overcoming obstacles 

25 posed by microsatellite repeats, other direct repeats, and inverted repeats, in a given 

DNA segment. The iterative and regenerative DNA sequencing method described herein 
also overcomes the obstacles to sequencing several thousand distinct DNA segments 
attached to addressable sites on a matrix or a chip, because it is carried out in iterative 
steps and in various embodiments effectively preserves the sample through a multitude 

30 of sequencing steps, or creates a nested set of DNA segments to which a few steps are 
applied in common. It is, therefore, highly suitable for automation. Furthermore, the 
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present invention particularly addresses the problem of increasing throughput in DNA 
sequencing, both in number of steps and parallelism of analyses, and it will facilitate the 
identification of disease-associated gene polymorphisms, with particular value for 
sequencing entire genomes and for characterizing the multiple gene mutations 

5 underlying polygenic traits. Thus, the invention pertains to novel methods for 

generating staggered templates and for iterative and regenerative DNA sequencing as 
well as to methods for automated DNA sequencing. 

Accordingly, the invention features a method for identifying a first nucleotide n 
and a second nucleotide n + x in a double stranded nucleic acid segment. The method 

1 0 includes (a) digesting the double stranded nucleic acid segment with a restriction 
enzyme to produce a double stranded molecule having a single stranded overhang 
sequence corresponding to an enzyme cut site; (b) providing an adaptor having a cycle 
identification tag, a restriction enzyme recognition domain, a sequence identification 
region, and a detectable label; (c) hybridizing the adaptor to the double stranded nucleic 

1 5 acid having the single-stranded overhang sequence to form a ligated molecule; (d) 
identifying the nucleotide n by identifying the ligated molecule; (e) amplifying the 
ligated molecule from step (d) with a primer specific for the cycle identification tag of 
the adaptor; and (f) repeating steps (a) through (d) on the amplified molecule from step 
(e) to yield the identity of the nucleotide n + x, wherein x is less than or equal to the 

20 number of nucleotides between a recognition domain for a restriction enzyme and an 
enzyme cut site. 

In another aspect, the invention features a method for identifying a first 
nucleotide n and a second nucleotide n + x in a double stranded nucleic acid segment. 
The method includes (a) digesting the double stranded nucleic acid segment with a 

25 restriction enzyme to produce a double stranded molecule having a single stranded 

overhang sequence corresponding to an enzyme cut site; (b) providing an adaptor having 
a cycle identification tag, a restriction enzyme recognition domain, a sequence 
identification region; (c) hybridizing the adaptor to the double stranded nucleic acid 
having the single-stranded overhang sequence to form a ligated molecule; (d) amplifying 

30 the ligated molecule from step (c) with a labeled primer specific for the cycle 

identification tag, restriction enzyme recognition domain and a portion of the sequence 
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identification region of the adaptor; (e) identifying the nucleotide n by identifying the 
primer incorporated into the amplification product; and (f) repeating steps (b) through 
(e) to yield the identity of the nucleotide n + x in each of the staggered double stranded 
molecules having the single strand overhang sequence thereby sequencing an interval 
5 within the double stranded nucleic acid segment, wherein x is greater than one and no 
greater than the number of nucleotides between a recognition domain for a restriction 
enzyme and an enzyme cut site. 

In another aspect, the invention provides a method for identifying a first 
nucleotide n and a second nucleotide n + x in a double stranded nucleic acid segment. 

1 0 The method includes (a) digesting the double stranded nucleic acid segment with a 
restriction enzyme to produce a trimmed end in the double stranded molecule; (b) 
providing an adaptor having a cycle identification tag and a restriction enzyme 
recognition domain; (c) ligating the adaptor to the trimmed end of the double stranded 
nucleic acid to form a ligated molecule; (d) amplifying the ligated molecule from step 

1 5 (c) with a labeled primer specific for the cycle identification tag, the restriction enzyme 
recognition domain of the adaptor, and for a nucleotide in the trimmed end in the double 
stranded molecule; (e) identifying the nucleotide n by identifying the primer 
incorporated into the amplification product and (f) repeating steps (a) through (e) on the 
amplified molecule from step (e) to yield the identity of the nucleotide n + x, wherein x 

20 is less than or equal to the number of nucleotides between a recognition domain for a 
restriction enzyme and an enzyme cut site. 

In another aspect, the invention features a method for sequencing an interval 
within a double stranded nucleic acid segment by identifying a first nucleotide n and a 
second nucleotide n + x in a plurality of staggered double stranded molecules produced 

25 from the double stranded nucleic acid segment. The method includes (a) attaching an 
enzyme recognition domain to different positions along the double stranded nucleic acid 
segment within an interval no greater than the distance between a recognition domain 
for a restriction enzyme and an enzyme cut site, such attachment occurring at one end of 
the double stranded nucleic acid segment; (b) digesting the double stranded nucleic acid 

30 segment with a restriction enzyme to produce a plurality of staggered double stranded 
molecules each having a single stranded overhang sequence corresponding to the cut 
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site; (c) providing an adaptor having a restriction enzyme recognition domain, a 
sequence identification region, and a detectable label; (d) hybridizing the adaptor to the 
double stranded nucleic acid having the single-stranded overhang sequence to form a 
ligated molecule; (e) identifying a nucleotide n within a staggered double stranded 
5 molecule by identifying the ligated molecule; (f) repeating steps (b) through (e) to yield 
the identity of the nucleotide n + x in each of the staggered double stranded molecules 
having the single strand overhang sequence thereby sequencing an interval within the 
double stranded nucleic acid segment, wherein x is greater than one and no greater than 
the number of nucleotides between a recognition domain for a restriction enzyme and an 
10 enzyme cut site. 

In another aspect, the invention features a method for identifying a first 
nucleotide n and a second nucleotide n + x in a double stranded nucleic acid segment. 
The method includes (a) digesting the double stranded nucleic acid segment with a 
restriction enzyme to produce a double stranded molecule having a 5* single stranded 

15 overhang sequence corresponding to an enzyme cut site; (b) identifying the nucleotide n 
by template-directed polymerization with a labeled nucleotide or nucleotide terminator; 
(c) providing an adaptor having a cycle identification tag and a restriction enzyme 
recognition domain; (d) ligating the adaptor to the double stranded nucleic acid to form a 
ligated molecule; (e) amplifying the ligated molecule from step (d) with a primer 

20 specific for the cycle identification tag of the adaptor; and (f) repeating steps (a) through 
(b) on the amplified molecule from step (e) to yield the identity of the nucleotide n + x, 
wherein x is less than or equal to the number of nucleotides between a recognition 
domain for a restriction enzyme and an enzyme cut site. 

Yet another aspect of the invention pertains to a method for sequencing an 

25 interval within a double stranded nucleic acid segment by identifying a first nucleotide n 
and a second nucleotide n + x in a plurality of staggered double stranded molecules 
produced from the double stranded nucleic acid segment. The method includes (a) 
attaching an enzyme recognition domain to different positions along the double stranded 
nucleic acid segment within an interval no greater than the distance between a 

30 recognition domain for a restriction enzyme and an enzyme cut site, such attachment 
occurring at one end of the double stranded nucleic acid segment; (b) digesting the 
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double stranded nucleic acid segment with a restriction enzyme to produce a plurality of 
staggered double stranded molecules each having a 5' single stranded overhang sequence 
corresponding to the cut site; (c) identifying a nucleotide n within a staggered double 
stranded molecule by template-directed polymerization with a labeled nucleotide or 

5 nucleotide terminator; (d) providing an adaptor having a restriction enzyme recognition 
domain; e) ligating the adaptor to the double stranded nucleic acid to form a ligated 
molecule; (f) repeating steps (b) through (c) to yield the identity of the nucleotide n + x 
in each of the staggered double stranded molecules having the single strand overhang 
sequence thereby sequencing an interval within the double stranded nucleic acid 

1 0 segment, wherein x is greater than one and no greater than the number of nucleotides 
between a recognition domain for a restriction enzyme and an enzyme cut site. 

The invention also pertains to a method for removing all or a part of a primer 
sequence from a primer extended product. The method includes (a) providing a primer 
sequence encoding a methylated portion of a restriction endonuclease recognition 

1 5 domain, wherein recognition of the domain by a restriction endonuclease requires at 
least one methylated nucleotide; (b) polymerizing by a template-directed primer 
extension using the primer and a nucleic acid segment to generate a primer extended 
product; and (c) digesting the primer extended product with a restriction endonuclease 
that recognizes the resulting double-stranded restriction endonuclease recognition 

20 domain encoded by the primer sequence in the primer extended product. 

In another aspect, the invention provides a method for removing all or part of a 
primer sequence from a primer extended product The method includes (a) providing a 
primer sequence encoding a portion of a restriction endonuclease recognition domain; 
(b) polymerizing by a template-directed primer extension using the primer, a methylated 

25 nucleotide, and a nucleic acid segment to generate a primer extended product during 
nucleic acid amplification in v/fro,wherein the non-methylated nucleotide corresponding 
to the methylated nucleotide is contained within the portion of the recognition domain 
sequence in the primer sequence and; (c) digesting the primer extended product with a 
restriction endonuclease that recognizes the resulting hemi-methylated double-stranded 

30 restriction endonuclease recognition domain encoded by the primer sequence in the 
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primer extended product, and does not recognize the double-methylated products 
resulting from nucleic acid amplification in vitro. 

A still further aspect of the invention pertains to a method for blocking a 
restriction endonuclease recognition domain in a primer extended product. The method 

5 includes (a) providing a primer with at least one modified nucleotide, wherein the 
modified nucleotide blocks an enzyme recognition domain, and at least a portion of the 
enzyme recognition domain sequence is encoded in the primer; (b) polymerizing by a 
template-directed primer extension using the primer and a nucleic acid segment to 
generate a primer extended product; and (c) digesting the primer extended product with 

10 an enzyme that recognizes a double-stranded enzyme recognition domain in the primer 
extended product. 

In another aspect of the invention there is provided a method and device for 
automated sequencing of double-stranded DNA segments with nested single strand 
overhang templates, wherein a plurality of double-stranded DNA segments are 

1 5 immobilized at sites of a microtiter support or chip array having a plurality of sample 
holders arrayed in a matrix of positions on the support. Each DNA segment has an end 
comprising a single-strand overhang template sequence no longer than about twenty 
nucleotides in length. The device then implements a protocol simultaneously treating all 
sample holders with one or more reagents which selectively react with at least one 

20 nucleotide of the single-strand overhang template to effectively label the material at each 
holder, then reading the array by automated detection to determine at least one 
nucleotide of the single-strand overhang template at each position. Thereafter, the 
method proceeds by reducing length of each strand of the DNA segment at each holder 
by a fixed number n > 1 at the overhang end, thus yielding a homologously ordered array 

25 of shorter and nested DNA segments, each with a single-strand overhang template 
sequence, which preferably remain immobilized at the same positions on the support 
where the treatment protocol is repeated to determine at least one nucleotide at each 
single-strand overhang sequence. The steps of treating, reading and reducing the length 
of the strands of the DNA segment at each holder by a number of n > 1 nucleotides are 

30 iteratively performed as automated process steps to produce nested and progressively 
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shorter DNA segments and to sequence the plurality of DNA segments immobilized at 
the array of sample holders in situ. 

In another aspect the invention includes a method for automated sequencing of 
double stranded DNA segments by attaching a recognition domain to each segment to 

5 form a set of DNA segments having the recognition domain nested at an interval no 
greater than the distance between the recognition domain and its cut site for a given 
enzyme that recognizes the recognition domain; treating the DNA segments with an 
enzyme that recognizes the attached recognition domain and cuts each strand of each 
DNA segment to create an overhang template at a distance of > 1 nucleotide along the 

10 DNA segment from the recognition domain so as to generate a set of nested overhang 
templates; and determining at least one nucleotide of each of the nested overhang 
templates. Thereafter, the method proceeds by reducing length of each strand at the end 
of the DNA segment with the overhang template by > 1 nucleotide to produce a 
corresponding set of shorter DNA segments each with an overhang template. The step of 

1 5 reducing is performed by removing a block of nucleotides, so that each shorter DNA 
segment with an overhang template is a known subinterval of a previous DNA segment 
with overhang. 

In another aspect of the invention there is provided a method and device for 
automated sequencing of double-stranded DNA segments, wherein a plurality of double- 

20 stranded DNA segments are immobilized at sites of a microtiter support or chip array 
having a plurality of sample holders arrayed in a matrix of positions on the support. Each 
DNA segment has an end comprising a single-strand overhang template sequence no 
longer than about twenty nucleotides in length. The device then simultaneously treats all 
sample holders with one or more reagents which selectively react with at least one 

25 nucleotide of the single-strand overhang template to effectively label the material at each 
holder, and reading the array by automated detection to determine at least one nucleotide 
of the single-strand overhang template at each position. Thereafter, the method proceeds 
by regenerating material at the respective sample holders by DNA amplification in vitro 
and reducing length of each strand of the regenerated DNA segment at each holder by a 

30 fixed number n > 1 at the overhang end, thus yielding a homologously ordered array of 
nested DNA segments, each with a single-strand overhang template sequence, which 
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preferably remain immobilized at the same positions on the support, and the treatment 
protocol is repeated to determine at least one nucleotide at each single-strand overhang 
sequence. The steps of treating, reading, regenerating and reducing the length of the 
strands of the DNA segment at each holder by a number of n >1 nucleotides are 

5 iteratively performed as automated process steps to produce nested and progressively 
shorter DNA segment ends and to sequence the plurality of DNA segments immobilized 
at the array of sample holders in situ. 

In another aspect the invention includes a method for automated sequencing of 
double stranded DNA segments by attaching a recognition domain to each segment to 

10 form DNA segments having the recognition domain, regenerating the template precursor 
by DNA amplification in vitro, treating the DNA segments with an enzyme that 
recognizes the attached recognition domain and cuts each strand of each DNA segment 
to create an overhang template at a distance of > 1 nucleotide along the DNA segment 
from the recognition domain, and determining at least one nucleotide of the overhang 

1 5 template. The method includes the step of reducing length of each strand at the end of 
the DNA segment with the overhang template by > 1 nucleotide to produce a 
corresponding set of shortened DNA segments each with an overhang template, the step 
of reducing being performed by removing a block of nucleotides, so that each shortened 
DNA segment with an overhang template is a known subinterval of a previous DNA 

20 segment with overhang. 

The invention further contemplates an automated instrument for effectively 
performing the sequencing, wherein a stage carries the support on a device equipped for 
providing the respective buffers, solutions and reagents, for stepping or positioning the 
array for reading, and in some embodiments robotic manipulation for sample transfer, 

25 and heating for amplification ,e.g., treating at least a portion of material at each sample 
holder with a primer and heat cycling to regenerate material at the respective sample 
holders. The stage may be rotatable, spinning to cause fluid provided at a central 
position to centrifiigally flow across the array to alter material immobilized in the 
sample holders. Preferably the stage holds plural support arrays, and may operate 

30 robotically to transfer material from the sites of one support array to the sites of another 
support array, so that all the samples on one support may undergo one set of process 
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steps in common (e.g., washing, digestion, labeling) while those on the other support 
undergo another (e.g., heating/amplification or scintillation reading). 

Generally, the methods of the invention are applicable to all tasks where DNA 
sequencing is employed, including medical diagnostics, genetic mapping, genetic 
5 identification, forensic analysis, molecular biology research, and the like. 

Brief Description of the Drawings 

Figure 7 is a schematic diagram of interval DNA sequencing method using a 
class-IIS restriction endonuclease that generates a 5' overhang (Fokl), template-directed 
10 ligation to labeled adaptors, and PCR. DNA encoded by oligonucleotides or their PCR 
generated complements is depicted as thick lines. Following each cycle the template 
precursor is shortened. 

Figure 2 is a schematic diagram of interval DNA sequencing method using a 
class-IIS restriction endonuclease that generates a 3' overhang (BseRl), template-directed 
1 5 ligation to labeled adaptors, and PCR. DNA encoded by oligonucleotides or their PCR 
generated complements is depicted as thick lines. Following each cycle the template 
precursor is shortened. 

Figure J is a schematic diagram of interval DNA sequencing method using a 
class-IIS restriction endonuclease that generates a 5' overhang (Fokl), template-directed 
20 polymerase extension with labeled terminators, template-directed ligation, and PCR. 
DNA encoded by oligonucleotides or their PCR generated complements is depicted as 
thick lines. Following each cycle the template precursor is shortened. 

Figure 4 is a schematic diagram of interval DNA sequencing method using a 
class-IIS restriction endonuclease that generates a 5 ! overhang (Fokl), template-directed 
25 polymerase extension with labeled terminators, template-directed ligation, and PCR. 
The template complementary to the template in Figure 3 is attached to a solid phase and 
is sequenced. DNA encoded by oligonucleotides or their PCR generated complements is 
depicted as thick lines. Following each cycle the template precursor is shortened. 

Figure 5 is a photograph depicting the size of the initial template precursor and 
30 of subsequent template precursors following each of five iterative sequencing simulation 
cycles consisting of Fokl digestion, adaptor ligation, fill-in with ddNTPs, and PCR 
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amplification, run on a 12 % denaturing acrylamide gel. Lane 1 , MW markers (17-mer, 
25-mer, 37-mer, 48-mer, 70-mer); Lane 2 , Initial template precursor: 93 base pair PCR 
product amplified from human genomic DNA; Lane 3 , Template precursor following 
sequencing cycle #1 (90 bp); Lane 4 , Template precursor following sequencing cycle #2 

5 (82 bp); Lane 5 , Template precursor following sequencing cycle #3 (72 bp); Lane 6 , 
Template precursor following sequence cycle #4 (64 bp); Lane 7 , Template precursor 
following sequencing cycle #5 (54 bp). 

Figure 6 is a schematic diagram which illustrates the removal of primer encoded 
sequence from a PCR product by amplification with a primer encoding a Dpnl 

10 recognition domain, which requires a methylated nucleotide, followed by cutting Dpn I. 
The primer sequences are underlined. The primer encoding the Dpnl recognition 
domain had two mismatches with the original PCR template, and the two mismatched 
nucleotides are depicted in bold. 

Figure 7 is a photograph depicting Dpn I cutting of a PCR product, such cutting 

1 5 directed by a methylated primer sequence, run on an acrylamide gel: lane 1 , 33 p.1 (1 ^g) 
of uncut 55 bp PCR product; lane 2 , 33 |d of 55 bp PCR product cut with 20 U Dpn I, 
generating a 40 bp product; lane 3 , 33 ^1 of 55 bp PCR product cut with 1 00 U Dpn I, 
generating a 40 bp product; lane 4 , MW markers (17-mer, 25-mer, 37-mer, 48-mer, 70- 
mer). 

20 Figure 8 is a schematic representation of an automated instrument for automated 

sequencing of multiple DNA segments. 

Figure 9 is a schematic representation of chips and reagents for DNA sequencing 
on a disk. The transfer of reagents to multiple chips occurs through centrifugal force by 
disk rotation. 

25 

Detailed Description of the Invention 

The present invention pertains to an iterative and regenerative method for 
sequencing DNA that exploits the separation of the restriction enzyme recognition and 
cleavage domains in class-IIS restriction endonucleases, as well as adaptor ligation, to 
30 generate a series of sequencing templates that are separated from each other by a discrete 
interval. These sequencing templates constitute a set of single-strand overhangs that can 
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then be sequenced by template-directed ligation, template-directed polymerization, or by 
stringent hybridization of oligonucleotides or oligonucleotide analogs. 

The present invention features a method for identifying a first nucleotide n and a 
second nucleotide n + x in a double stranded nucleic acid segment. The method includes 

5 (a) digesting the double stranded nucleic acid segment with a restriction enzyme to 
produce a double stranded molecule having a single stranded overhang sequence 
corresponding to an enzyme cut site and (b) providing an adaptor having a cycle 
identification tag, a restriction enzyme recognition domain, a sequence identification 
region, and a detectable label. The method further includes (c) hybridizing the adaptor 

1 0 to the double stranded nucleic acid having the single-stranded overhang sequence to 
form a ligated molecule, (d) identifying the nucleotide n by identifying the ligated 
molecule, and (e) amplifying the ligated molecule from step (d) with a primer specific 
for the cycle identification tag of the adaptor. The method also includes (f) repeating 
steps (a) through (d) on the amplified molecule from step (e) to yield the identity of the 

1 5 nucleotide n + x, wherein x is less than or equal to the number of nucleotides between a 
recognition domain for a restriction enzyme and an enzyme cut site. As is described 
more fully below the order of steps (a) through (f) may vary with different embodiments 
of the invention. 

As used herein, the term "nucleotide n" refers to a nucleotide along a given 
20 nucleic acid segment. "Nucleotide" is an art-recognized term and includes molecules 
which are the basic structural units of nucleic acids, e.g., RNA or DNA, and which are 
composed of a purine or pyrimidine base, a ribose or a deoxyribose sugar, and a 
phosphate group. A "modified nucleotide," as used herein, refers to a nucleotide that has 
been chemically modified, e.g., a methylated nucleotide. "Analogs" in reference to 
25 nucleotides includes synthetic nucleotides having modified base moieties and/or 
modified sugar moieties, e.g., as described generally by Scheit, Nucleotide Analogs 
(John Wiley, New York, 1980). Such analogs include synthetic nucleotides designed to 
enhance binding properties, induce degeneracy, increase specificity, and the like. In the 
methods described herein, n designates a fixed position within a single stranded 
30 overhang sequence extending from each double stranded nucleic acid segment. 

Preferably, nucleotide n is selected by digesting a given double stranded nucleic acid 
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segment with a restriction enzyme, e.g., a class IIS restriction endonuclease, to generate 
a 5' or a 3 ! single stranded overhang sequence corresponding to the cut site, and n is the 
first or the last unpaired nucleotide in the overhang sequence. 

As used herein, the term "nucleotide n + x" refers to a second nucleotide in a 

5 given nucleic acid segment which is separated from nucleotide n by x nucleotides along 
a nucleic acid segment. For methods described herein, "x" is a number which is less 
than or equal to the number of nucleotides between a restriction enzyme recognition 
domain and the corresponding enzyme cut site for a given enzyme. By convention, "x" 
is defined by two integers which give the number of nucleotides between the recognition 

10 site and the hydrolyzed phosphodiester bonds of each strand of a nucleic acid segment. 
Preferably, x is no longer than about 9 nucleotides, more preferably x is no longer than 
about 1 8, 20 or 30 nucleotides, and advantageously it is in the range between about 40 
and 60 nucleotides in length. For example, the recognition and cleavage properties of 
Fokl are typically represented as "GGATG(9/13)" because it recognizes and cuts a 

1 5 double stranded nucleic acid as follows: 

5'-...NNGGATGNNNNNNN^ >WNNNNNNNN... 
3'-...NNCCTACmi>nWNNNNNNhW NNNNNN... 

20 where the bolded nucleotides are FokYs recognition site and the N's are arbitrary 
nucleotides and their complements. 

As used herein, the language "restriction enzyme recognition domain" refers to a 
nucleotide sequence that allows a restriction enzyme to recognize this site and cut one or 
both strands of a nucleic acid segment at a fixed location with respect to the recognition 

25 domain. For class IIS restriction endonucleases, the cut site lies x nucleotides outside 
the recognition domain. Generally, the nucleotide sequence of the recognition domain is 
about 4 to about 10, more preferably about 4 to about 6, nucleotides in length. For 
example, for a class IIS restriction endonuclease, e.g., BseRl, the recognition domain is 6 
nucleotides in length. 
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The language "enzyme cut site," refers to the location of a strand cleavage by an 
enzyme where this cleavage occurs in a fixed location with respect to the restriction 
enzyme recognition domain. For class IIS restriction endonuclease, the enzyme cut site 
is located x nucleotides away from the recognition domain. In one embodiment, the 

5 enzyme cut site is the site located the farthest from the restriction enzyme recognition 
domain. Preferably, the enzyme cut site is the site located closest to the restriction 
enzyme recognition domain. 

"Enzyme" as the term is used in accordance with the invention means an enzyme, 
combination of enzymes, or other chemical reagents, or combinations chemical reagents 

1 0 and enzymes that when applied to a ligated molecule, discussed more fully below, 
cleaves the ligated molecule to generate a double stranded molecule having a single 
stranded overhang sequence corresponding to a cut site. An enzyme of the invention 
need not be a single protein, or consist solely of a combination of proteins. A key 
feature of the enzyme, or of the combination of reagents employed as an enzyme, is that 

1 5 its (their) cleavage site be separate from its (their) recognition site. It is important that 
the enzyme cleave the nucleic acid segment after it forms a ligated molecule with its 
recognition site; and preferably, the enzyme leaves a 5' or 3 1 protruding strand on the 
nucleic acid segment after cleavage. 

Preferably, enzymes employed in the invention are natural protein endonucleases 

20 whose recognition site is separate from its cleavage site and whose cleavage results in a 
protruding strand on the nucleic acid segment. Most preferably, class IIS restriction 
endonucleases are employed as enzymes in the invention, e.g., as described in Szybalski 
et al., Gene, 100:13-26 (1991); Roberts et al., Nucleic Acids Research, 21 :3125-3137 
(1993);and Lovak and Brenner, U.S. Pat No. 5,093,245. Class-IIS restriction 

25 endonucleases are a subclass of class-II restriction endonucleases that cut at precise 
distances away from their recognition domains, so that the recognition domains and 
cleavage domains are separated on the substrate DNA molecule (Szybalski W, SC Kim, 
N Hasan, AJ Podhajska Gene 1991; 100:13-26). Following digestion with class-IIS 
restriction endonucleases, the sequence of the single-stranded end is independent of the 

30 recognition domain sequence. Class-IIS restriction endonucleases usually have 

asymmetric recognition domains, and class-IIS restriction endonucleases typically cut on 
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one side of the recognition domain, resulting in one double-stranded cut per recognition 
site. Over 70 class-IIS restriction endonucleases have been isolated. Because the 
cleavage domain is separate from the recognition domain, methylation of nucleotides 
that lie within the cleavage domain will not effect cleavage, so long as the corresponding 

5 recognition domain is not methylated (Podhajska AJ, W Szybalski Gene 1 985;40: 1 75- 
182, Podhajska AJ, SC Kim, and W Szybalski Methods in Enzymology 1992; 216:303- 
309, Posfai G, W Szybalski Gene 1988; 69:147-151). Exemplary class IIS restriction 
endonucleases for use with the invention include AccBSI, AceHI, Acil, AclWI, Alwl, 
Alw26I, AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, Asp50HI, AsuHPI, 

1 0 Bael, Bbsl, Bbvl, BbvII, Bbvl6II, Bce83I, Bcefl, Bcgl, BcoSI, Bcol 161 BcoKI, BinI, 
BU736I, Bpil, Bpml, BpulOI, BpuAI, Bsal, BsaMI, Bsc9II, BscAI, BscCI, Bsell, 
Bse3DI, BseNI, BseRI, BseZI, Bsgl, Bsil, BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, 
Bsp423I, BspBS3II, BspIS4I, BspKT5I, BspLUl 1III, BspMI, BspPI, BspST5I, 
BspTS514I, BsrI, BsrBI, BsrDI, BsrSI, BssSI, Bstlll, Bst71I, Bst2BI, BstBS32I, 

15 BstD102I, BstF5I, BstTSSI, Bsu6I, Cjel, CjePI, Eamll04I, Earl, Eco31I, Eco57I, 

EcoA4I, Eco044I, Esp3I, Faul, Fokl, Gdill, Gsul, Hgal, HphI, Ksp632I, MboII, Mlyl, 
Mmel, Mnll, Mval269I, Phal, Piel, RleAI, Sapl, SfaNI, SimI, StsI, Taqll, TspII, TspRI, 
Tthl 1 III, and VpaK32I, and isoschizomers thereof. Preferred endonucleases include 
Fokl and BseRI. 

20 Class-IIS restriction endonucleases have several applications, as outlined below. 

Class-IIS restriction endonucleases have been used in conjunction with an adaptor to act 
as a universal restriction endonuclease that can cut a single-stranded substrate at almost 
any predetermined site (Podhajska AJ, W Szybalski Gene 1985;40:175-182, Podhajska 
AJ, SC Kim, and W Szybalski Methods in Enzymology 1992; 216:303-309, Szybalski 

25 W. Gene 1985; 40:169-173). The adaptor consists of a double-stranded hairpin portion 
containing the recognition domain for the class IIS restriction endonuclease, and a single 
stranded end that is complementary to the single-stranded template to be cleaved. 
Following annealing of the adaptor to the single-stranded template (e.g. Ml 3), the class- 
IIS restriction endonuclease can cleave this site. A hairpin adaptor has also been used to 

30 attach a radiolabel to one end of a single-stranded phagemid DNA, to facilitate Maxam- 
Gilbert sequencing (Goszcynski B, McGhee JD Gene 1991 ; 104:71-74). 
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Class-IIS restriction endonucleases have been used to trim vector inserts in order 
to generate deletions in a vector insert (Mormeneo S, R Knott, D Perlman Gene 1987; 
61 :21-30, Hasan N, J Kur, W Szybalski Gene 1989; 82:305-31 1, Hasan N. SC Kim, AJ 
Podhajska, W Szybalski Gene 1986; 50:55-62). In this application, restriction 

5 endonuclease digestion removes a portion of the insert, and the resulting single-stranded 
ends are converted to blunt ends prior to intra-molecular ligation and the transformation 
of Exoli, generating a deletion mutant in the construct. If the class-IIS restriction 
endonuclease recognition domain is reconstituted, this process can be carried out again, 
generating a series of deletion mutants in the plasmid insert. This is not a sequencing 

1 0 method, and the single-strand overhangs that could act as sequencing templates are 
eliminated during the generation of each new plasmid construct. 

Class-IIS restriction endonuclease digestion has been used as a mapping tool in a 
fluorescent fingerprinting procedure (Brenner S, Livak KJ Proc Natl Acad Sci USA 
1989; 86:8902-8906). In this method, 5' overhangs are generated by cleavage with a 

15 class IIS restriction endonuclease, using the recognition domains that already exist in the 
original DNA. Digestion is followed by labeling these ends using convention dNTPs 
and ddNTPs tagged with distinct fluorescent labels. This labeling constitutes 
conventional Sanger sequencing with fluorescently labeled terminators. The restriction 
fragments are then analyzed by denaturing polyacrylamide gel electrophoresis, with 

20 detection of emissions using a DNA sequencer. The labeled fragments are characterized 
by both size and terminal sequence, increasing the information content in DNA 
fingerprinting, allowing this method to distinguish restriction fragments that cannot be 
resolved by size alone. 

The ability of class-IIS restriction endonucleases to generate ambiguous ends has 

25 also been used to amplify single restriction fragments from large DNA molecules 

ranging from about 50 - 250 kb in size (Smith DR Methods and Applications 1992; 2:21- 
27). In this method, digestion of the DNA molecule with a class-IIS restriction 
endonuclease that generates a 5' overhang is followed by ligation to a single adaptor, 
under conditions such that only a small subset of digested fragments have single- 

30 stranded ends that will successfully mediate template-directed ligation to this single 

adaptor. The ligated adaptor provides one target for subsequent PCR amplification of an 
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unknown fragment. The second target is provided by a vectorette unit (bubble-tag) 
ligated to blunt ends produced by another restriction endonuclease. This strategy 
permits the amplification of a single unknown fragment from the relatively complex 
mixture. It is designed so that specific fragments can be isolated without prior 

5 knowledge of the nucleotide sequence of the target. These amplified fragments arise 
from random locations within the target. A similar strategy has been developed in which 
adaptors ligated to the class-IIS restriction endonuclease cut sites are called DNA 
indexers (Kato K. Nucleic Acids Research 1996; 24:394-395, Unrau P, Deugau KV 
Gene 1994; 145:163-169). 

10 Restriction endonuclease digestion is frequently used to generate cohesive ends 

for cloning DNA segments into a vector. This can be accomplished by attaching 
restriction endonuclease recognition domains to the ends of a DNA fragment by ligation 
of a linker or adaptor. Alternatively, a recognition domain can be incorporated into the 
end of a nucleic acid sequence using a primer whose 5' end contains the restriction 

1 5 endonuclease recognition site of interest, followed by primer directed synthesis of the 
opposite strand. One limitation inherent in such primer directed incorporation of a 
restriction endonuclease recognition domain is that the fragment of interest cannot 
contain the recognition domain for this enzyme if the intact fragment is to be cloned by 
digestion with this restriction endonuclease, as cutting of internal sites would generate 

20 shorter segments. This particular obstacle was solved by Han and Rutter (Han J, Rutter 
WJ Nucleic Acids Res 1 988; 16:11 837). They incorporated a recognition domain for the 
restriction endonuclease Sfil into an end of DNA segments by primer directed DNA 
synthesis. A primer encoding this recognition domain was used during first strand 
cDNA synthesis, but during this polymerization step methylated-dCTP was substituted 

25 for dCTP. This was followed by primer mediated synthesis of the opposite strand using 
all four normal dNTPs. Since the Sfil recognition domain contains the cytosine 
nucleoside, the primer extension with 6-methyl dCTP methylates one strand of each 
recognition domain for Sfil lying outside of this primer sequence, blocking cleavage 
mediated by any recognition domain lying outside of this primer sequence. Hemi- 

30 methylation of the recognition domains lying outside of the primer sequence allowed 
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this restriction endonuclease to be used to clone intact segments containing recognition 
domains for this restriction endonuclease. 

Padgett and Sorge have adapted primer directed hemi-methylation of recognition 
domains lying outside a primer encoded recognition domain, to a polymerase chain 

5 reaction (PCR) (Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H. Cold Spring 
Harbor Symposia on Quantitative Biology, Cold Spring Hrbor Laboratory, LL263-273) 
format (Padgett KA, JA Sorge Gene 1996; 168:31-35). This strategy requires a 
recognition domain in which each strand has at least one nucleotide that is not contained 
in the other strand of this domain. A recognition domain with this characteristic allows 

1 0 one to use primer extension during the polymerase chain reaction (PCR) to hemi- 

methylate each of the recognition domains except for that recognition domain encoded 
by the amplifying primer. This is accomplished by using a methylated nucleotide that is 
not present in the recognition domain sequence that is antisense to the primer encoding 
this domain. By using a methylated dNTP that does not lie in the strand antisense to the 

1 5 recognition domain encoded in the amplifying primer, all the recognition domains in the 
PCR product are methylated except the recognition domain that is encoded by the 
amplifying primer. This strategy hemi-methylates each recognition domain in the PCR 
product except the primer-encoded recognition domain. This approach has been applied 
using a recognition domain for a class II-S restriction endonuclease, to generate 

20 recombinant constructs (Padgett KA, JA Sorge Gene 1 996; 168:3 1-35). 

The above described strategies permit a class-IIS recognition domain to be 
appended to the end of a DNA segment through primer extension, while hemi- 
methylating each recognition domain that lies within the original target, and they can be 
used to block cutting mediated by internal recognition domains without blocking cutting 

25 mediated by the primer-encoded recognition domain. The two strategies outlined above 
constitute portions of the preferred embodiments of the invention. 

Preferably, prior to enzyme digestion, usually at the start of the sequencing 
operation, the nucleic acid segment is treated by blocking the enzyme recognition 
domains of the enzyme being employed. The blocking prevents undesired cleavage of 

30 the nucleic acid segment because of the fortuitous occurrence of enzyme recognition 
domains at interior locations in the nucleic acid segment. Blocking can be achieved in a 
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variety of ways, including in vitro primer extension or in vitro primer extension with 
hemi-methylation, e.g., in vitro DNA amplification, or methylation of the enzyme 
recognition domain. For example, the DNA amplification can occur during or following 
the amplification of the ligated molecule. Hemi-methylation can be achieved in a 

5 variety of ways, including in vitro primer extension with a methylated nucleotide using a 
primer having the portion of an enzyme recognition domain that blocks enzyme 
recognition if it is hemi-methylated. Preferably, the restriction endonuclease employed 
recognizes a hemi-methylated enzyme recognition domain and a primer contains at least 
one methylated nucleotide in the methylated portion of the recognition domain. 

10 Furthermore, internal sites can also be blocked by methylation of each strand of a 

recognition domain thereby allowing specific removal of a primer encoded recognition 
domain. Waugh and Sauer have applied a genetic screen to isolate mutant Fokl 
restriction endonucleases that can cut via hemi-methylated Fokl recognition domains, 
but will not cut via doubly-methylated Fokl recognition domains (Waugh, D.S., and 

15 Sauer, R.T., J. Biol Chem., 269:12298-12303 (1994)). These mutants retain a high 
degree of specificity for the canonical recognition domain sequence instrinsic to the 
native enzyme. Using one of these mutants, one could use a primer encoding the 
recognition domain for Fokl and undergo PCR amplification with 6-methyl dATP 
substituted for dATP. This would doubly-methylate each recognition domain for Fokl, 

20 except for the primer encoded strand, which would be hemi-methylated, so that during 
digestion with the mutant Fokl, only the primer directed recognition domain would be 
recognized and mediate cleavage. The primer directed domain need not contain the 
entire recognition domain, but only the GGA portion of the upper strand GGATG Fokl 
recognition domain sequence, since this will prevent methylation of adenine in the 

25 primer's upper strand recognition domain during PCR. The genetic screen strategy 
outlined by Waugh and Sauer could also be used to isolate such mutants for other class- 
IIS restriction endonucleases, thereby expanding the number of restriction endonuclease 
recognition domains that can be appended to the end of DNA fragments during PCR 
with concomitant blocking of only internal recognition domains. 
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The language "nucleic acid segment" refers to a double stranded or single 
stranded polynucleotide of any length. In one embodiment of the invention, the nucleic 
acid segment can contain a single stranded overhang, a nick or a gap. For example, the 
nucleic acid segment of the invention can be a genomic DNA, a cDNA, a product of an 

5 in vitro DNA amplification, e.g., a PCR product, a product of a strand displacement 
amplification, or a vector insert. The length of the nucleic acid segment can vary 
widely; however, for convenience of preparation, lengths employed in conventional 
sequencing are preferred. Preferably, the nucleic acid segment of the invention is about 
60 basepairs in length, more preferably it is about 100, 120, 150, 200, 300 or 600 

10 basepairs in length, and most preferably it is about 1 to 2, or more kilobase pairs in 

length. Examples of other ranges of lengths include: from about 60 basepairs to about 1 
or 2 kilobase pairs; from about 60 basepairs to about 600 basepairs; from about 60 
basepairs to about 200 or 300 basepairs; and from about 60 basepairs to about 120 or 
1 50 base pairs. 

1 5 The nucleic acid segments can be prepared by various conventional methods. 

For example, the nucleic acid segments can be prepared as inserts of any of the 
conventional cloning vectors, including those used in conventional DNA sequencing. 
Extensive guidance for selecting and using appropriate cloning vectors is found in 
Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition (Cold 

20 Spring Harbor Laboratory, New York, 1989), and the like references Sambrook et al 
and Innis et al., editors, PCR Protocols (Academic Press, New York, 1990) also provide 
guidance for using polymerase chain reactions to prepare nucleic acid segments. 
Preferably, cloned or PCR-amplified nucleic acid segments are prepared which permit 
attachment to magnetic beads, or other solid supports, for ease of separating the nucleic 

25 acid segment from other reagents used in the method. Protocols for such preparative 
techniques are described fully in Wahlberg et al., Electrophoresis, 13:547-551 (1992); 
Tong et al., Anal. Chem. 64:2672-2677 (1992); Hultman et al., Nucleic Acids Research, 
17:4937^946 (1989); Hultman et al., Biotechniques, 10:84-93 (1991); Syvanen et al., 
Nucleic Acids Research, 16:113274 1338 (1988); Dattagupta et al., U.S. Pat. No. 

30 4,734,363; Uhlen, PCT application PCT/GB89/00304. Kits are also commercially 
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available for practicing such methods, e.g. Dynabeads™ template preparation kit from 
Dynal AS (Oslo, Norway). 

In one preferred embodiment of the invention, the nucleic acid segment is 
attached to a solid matrix. As used herein, the term "solid matrix" refers to a material in 
5 a solid form to which a DNA molecule can attach. Examples of a solid matrix include a 
magnetic particle, e.g., a magnetic streptavidin or a magnetic glass particle, a polymeric 
microsphere, a filter material, or the like. Preferably, the solid matrix used in the 
methods of the invention permits the sequential application of reagents to a DNA 
molecule without complicated and time-consuming purification steps. 
1 0 The nucleic acid segments of the invention can also be used to generate a 

plurality of staggered double stranded nucleic acid molecules having a single stranded 
overhang sequence. This is desirable when the sequencing interval is designed to be 
more than one nucleotide, and one nucleotide is sequenced from a single template during 
each cycle. The language "double stranded nucleic acid molecules having a single 
1 5 stranded overhang sequence" is intended to include a nucleic acid molecule created by 
the following method: attachment of an enzyme recognition domain at different 
positions within an interval of a selected double stranded nucleic acid segment, and 
digestion of the selected double stranded nucleic acid segment with a corresponding 
restriction enzyme. Preferably, the interval is no greater than the distance between a 
20 restriction enzyme recognition domain and an enzyme cut site. The resulting double 
stranded nucleic acid molecules having a single stranded overhang sequence constitute a 
plurality of staggered double stranded nucleic acid molecules. The single strand 
overhang sequence in the staggered nucleic acid molecule may be either 5' or 3\ 
Preferably, the number of nucleotides in the overhang portion of the strand is in the 
25 range from about 2 to about 6 nucleotides depending on the enzyme used to digest the 
nucleic acid segment. 

The language "sequencing an interval within a double stranded nucleic acid 
segment" is intended to include the sequencing which occurs by identifying nucleotides 
n and n + x in a plurality of staggered double stranded molecules produced from the 
30 selected double stranded nucleic acid segment. This allows one to sequence all of the 
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nucleotides in a selected nucleic acid segment between the nucleotide n and nucleotide n 
+ x. For example, for a class IIS restriction enzyme, e.g., Fokl, that has a restriction 
enzyme recognition domain nine nucleotides away from its enzyme cut site, e.g., x=9, 
starting with nine staggered double stranded nucleic acid molecules will generate 
5 sequence information for all nucleotides found in the interval between nucleotide n and 
nucleotide n + x. 

The staggered double stranded nucleic acid molecules having a single stranded 
overhang sequence can be prepared by various methods. For example, they can be 
generated by ligation of the initial nucleic acid segment to each of several adaptors with 

10 offset class-IIS recognition domains (Wu R, T Wu, R Anuradh, Enzymology 

1987;152:343-349). This initial DNA segment to be sequenced can be a PCR product or 
a vector insert. If the PCR product is amplified using a DNA polymerase with terminal 
extendase activity, the resulting single nucleotide 3' overhang can be removed using a 
DNA polymerase with 3' exonuclease, such as T4 DNA polymerase or Pfu DNA 

15 polymerase, prior to blunt end ligation to adaptors (Costa GL, MP Weiner, Nucleic Acids 
Research 1994;22:2423). Offset recognition domains can also be encoded into the 
amplification primers (Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H., Cold 
Spring Harbor Symposia on Quantitative Biology, Cold Spring Harbor Laboratory, 
LI:263-273), resulting in distinct amplification products with offset recognition domains. 

20 There are a variety of ways in which offset recognition domains can be appended 

to each of numerous inserts in a DNA library. For example, if a complete digest were 
carried out on genomic DNA with the frequent cutter Saul AI, followed by a partial fill- 
in with dGTP and dATP, each insert would contain non-self-complementary DNA ends 
(Hung M-C, PC Wensink. Nucleic Acids Research. 1984; 12: 1863-1874). The vector 

25 could be digested with Sail and undergo a partial fill-in reaction with dCTP and dTTP, 
resulting in linearized vectors with non-self-complementary DNA ends. In this case 
each insert DNA end is complementary to each vector DNA end, so that during DNA 
ligation with cut and partially filled-in inserts and vectors, the vast majority of the 
resulting clones will contain one insert (Zabarovsky ER, RL Allikmets. Gene. 1986; 42: 

30 1 19-123). Following the isolation of individual clones, each insert can undergo PCR 
amplification using primers that anneal to the vector sequence, with one of the primers 
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disabling the Sau3 AI site in one side of each amplified insert by having a base mismatch 
to the Sau3Al site near its 3' end, or, preferably, a methylated nucleotide in the 3 1 end 
region of the primer (this primer f s 3' end encoding at least part of the Sau3 AI recognition 
domain (GATC), so that it will prime efficiently and its methylated nucleotide will block 

5 Sau3 AI cutting of this end of the PCR product, allowing cutting of the opposite end of 
the PCR product). If the adenine is methylated, cutting can be done using Mbo\ or 
Dpnll, which share the recognition domain of Sau3Al but are blocked by dam 
methylation. Following digestion, one end of each insert will have a four nucleotide 
long end that can undergo ligation to an initial adaptor, so that ligations to distinct initial 

10 adaptors can append staggered recognition domains (for the class-IIS restriction 

endonuclease that will be used for sequencing) to each of the numerous inserts in the 
library. 

An alternative approach is typically to generate a library of clones using 
randomly sheared DNA. These DNA fragments can be dephosphorylated and efficiently 

1 5 cloned with one insert per vector using a vector that requires inactivation of a selectable 
marker by DNA insertion to be viable in a given E.coli host (Bernard P. BioTechniques. 
1996; 21 : 320-323). Alternatively, a pool of inserts can be size selected over an agarose 
gel prior to cloning into a vector (Fleischmann RD, et aL Science. 1995; 269: 496-512). 
Using either approach, or other cloning strategies, each vector insert could be amplified 

20 using one primer that contains a strand of the recognition domain up to and including a 
methylated nucleotide for a restriction endonuclease that recognizes a hemi-methylated 
domain but does not recognize a non-methylated domain. This can be accomplished by 
using a primer that has one strand of the recognition domain sequence, with at least one 
methylated nucleotide, so that digestion with the corresponding restriction endonuclease 

25 will cut that one end of each amplified product, and no other sites. This can be carried 
out by amplification with a primer that contains one strand of the recognition domain for 
Dpn\ (with a methylated adenine). This strategy allows PCR amplification with normal 
nucleotides, as PCR with normal nucleotides effectively blocks internal Dpril 
recognition domains. Alternatively, each end could be amplified and digested using the 

30 strategy of Padgett and Sorge (Padgett KA, JA Sorge Gene 1996; 168:31-35), with either 
a regular class-II restriction endonuclease or with a class-IIS restriction endonuclease. 
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In this method, the opposite end of each nucleic acid segment is shared between 
each of the initial template precursors for a given nucleic acid segment to be sequenced. 
Each initial template precursor is attached to a solid matrix. A wide range of methods 
have been used to bind DN A to a solid matrix. If the template precursor is a PCR 

5 product, one primer can contain a moiety that is used to attach the PCR product to a 
solid matrix. For example, this primer can contain a biotin moiety or another reactive 
moiety such as an amine group or thiol group, permitting the attachment of the PCR 
product to a solid matrix (Syvanen AC, M Bengstrom, J Tenhunen and H Soderlund, 
Nucleic Acids Research 1988; 16:1 1327-1 1338; Stamm S, J Brosius, Nucleic Acids 

1 0 Research 1 991 ; 1 9: 1 350; Lund V, R Schmid, D Rickwood and E Homes, Nucleic Acids 
Research 1988; 16:10861-10880; Fahy E, GR Davis, LJ DiMichele, Ss Ghosh, Nucleic 
Acids Research 1993; 21:1819-1826; and Kohsaka H, A Taniguchi, DD Richman, DA 
Carson, Nucleic Acids Research 1993; 21 :3469-3472). The solid matrix can be either 
immobile or dispersible. For example, for a DNA segment with a biotinylated end, an 

1 5 immobile solid matrix can be an avidin or streptavidin coated microtiter plate ( Jeltsch A, 
A Fritz, J Alves, H Wolfes, A Pingoud, Analytical Biochemistry 1993; 213:234-240; 
Holmstrom K, L Rossen, OF Rasmussen, Analytical Biochemistry 1993; 209:278-283) 
or manifold support (Lagerkvist A, J Stewart, M Lagerstrom-Fermer, U Landegren, Proc 
Natl Acad Sci USA 1994; 91:2245-2249). The most readily available dispersible solid 

20 matrix is beads that can be suspended through shaking. Beads can be designed to be 
magnetically pelleted (Lund V, RSchmid, D Rickwood and E Homes Nucleic Acids 
Research 1988; 16:10861-10880, Hultman T, S Stahl, E Homes, M Uhlen Nucleic Acids 
Research 1989; 17:4937-4946, Dawson BA, T Herman, J Lough Journal of Biological 
Chemistry 1989;264:12830-12837)or they can be pelleted through centrifugation 

25 (Syvanen AC, M Bengstrom, J Tenhunen and H Sodelund, Nucleic Acids Research 
1988; 16:11327-11338; Stamm St, J Brosius, Nucleic Acids Research 1991; 19:1350). 
Use of a dispersible solid matrix diminishes steric obstacles in enzymatic reactions, and 
facilitates removal of a small aliquot to be amplified. An alternative approach that 
allows a small aliquot of a reaction to be removed and used as a template for 

30 amplification is to use a method of reversible capture. Reversible capture can be 

accomplished by using a cleavable linkage arm (such as a chemically cleavable linkage 
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arm or a photocleavable linkage arm (Dawson B A, T Herman, J Lough Journal of 
Biological Chemistry 1989; 264: 12830-12837, Olejnik J, E Krzymanska-Olejnik, KJ 
Rothschild, Nucleic Acids Research 1 996; 24:361-366), by using a primer-encoded 
DNA binding domain that can be unbound by denaturation (Lew AM, DJ Kemp, Nucleic 
5 Acids Research 1 989; 1 7:5859; Kemp DJ, DB Smith, SJ Foote, N Samaras, MG 
Peterson, Proc Natl Acad Sci USA 1989; 86:2423-2427; Kemp DJ, Methods in 
Enzymology 1992; 216:1 16-126), or by the generation of a single stranded end during 
PCR, as such an end can reversibly anneal to its complement that is bound to a solid 
phase (Newton CR, D Holland, LE Heptinstall, I Hodgson, MD Edge, AF Markham, MJ 
10 McLean, Nucleic Acids Research 1993; 21 :1 155-1 162; Khudyakov YE, L Gaur, J Singh, 
P Patel , HA Fields, Nucleic Acids Research 1994;22:1320-1321). 

Another important aspect of the invention is the adaptor employed within the 
present invention. An adaptor of the invention is a double stranded or a single stranded 
polynucleotide having one or more of a cycle identification tag, a restriction enzyme 
1 5 recognition domain and a sequence identification region. Preferably, the adaptor may 
also include a detectable label, which in the particular embodiment of Figure 1 is 
illustrated at the end opposite of the sequence identification region. 

As used herein, the language M a cycle identification tag" refers to a unique 
nucleotide sequence that generates a primer annealing site, and a primer can anneal 
20 either to the unique sequence or its complement. The cycle identification tag is of a 
length which allows it to perform its intended function. Examples of lengths include: 
from about 8 to about 60 nucleotides in length; from about 8 to about 30 or 40 
nucleotides in length; and from about 8 to about 1 5 or 20 nucleotides in length. Ligation 
of this unique sequence to each double stranded nucleic acid segment having the single 
25 stranded overhang sequence permits regeneration of each nucleic acid segment using 
primer-directed DNA amplification in vitro (e.g., PCR), ameliorating the major 
limitations inherent in iterative methods for product generation, e.g., product losses and 
the accumulation of incompletely processed products. 

The language "restriction enzyme recognition domain" has been defined above. 
30 In one embodiment of the invention, the adaptor contains only a single strand of a 
restriction enzyme recognition domain, because a single strand of the domain can 
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function as a template for the generation of a double stranded restriction enzyme 
recognition domain through hybridization to its complement or through template- 
directed polymerase generation of its complement. 

As used herein, the language "sequence identification region" refers to a region 

5 used to identify nucleotide n and/or nucleotide n + x in a selected nucleic acid segment. 
Preferably, the region used to identify nucleotide n and/or nucleotide n + x is a 
protruding nucleotide strand, e.g., a 5' or a 3' nucleotide strand. In one embodiment of 
the invention, the sequence identification region is capable of forming a duplex with the 
single stranded overhang sequence of the double stranded nucleic acid segment. 

10 Preferably, the sequence identification region comprises a number of degenerate 

nucleotides, usually between 1 and 4 degenerate nucleotides. In addition, the sequence 
identification region can also include a fixed nucleotide, e.g., a nucleotide whose 
sequence is known, at its most terminal nucleotide. Preferably, at each cycle, only those 
adaptors whose sequence identification regions form duplexes with the single stranded 

1 5 overhang sequence of the double stranded nucleic acid segment, are hybridized to the 
one end of the nucleic acid segment to form a ligated molecule. 

As used herein, the term "a ligated molecule" refers to a double stranded 
structure formed after the sequence identification region of an adaptor and the single 
strand overhang sequence of the nucleic acid segment anneal and at least one pair of the 

20 identically oriented strands of the adaptor and the nucleic acid segment are ligated, i.e., 
are caused to be covalently ligated to one another. In one embodiment of the invention, 
the ligated molecule is labeled with a detectable label on at least one strand of the 
molecule and detection occurs following the removal of an unligated labeled adaptor. In 
other embodiments, the ligated molecule is formed following a blunt end ligation. 

25 As used herein, the term "hybridization" refers to annealing of a nucleic acid 

sequence to its complement. Hybridization can occur in the presence of a non-annealing 
region or a nucleotide analog. In one embodiment of the invention, hybridization can 
also entail ligation. In another embodiment of the invention, hybridization precedes 
ligation. The term "ligation," as used herein, refers to a ligation of two molecules using 

30 conventional procedures known in the art. Ligation can be accomplished either 

enzymatically or chemically. Chemical ligation methods are well known in the art, e.g., 
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Ferris et al., Nucleotides & Nucleotides, 8:407-414 (1989); Shabarova et al., Nucleic 
Acid Res. 19:4247-4251 (1991). Preferably, however, ligation is carried out 
enzymatically using a ligase in a standard protocol. Many ligases are known and are 
suitable for the use in the present invention, e.g., Lehman, Science 186:790-797 (1974); 

5 Boyer, ed., The Enzymes Vol. 1 5B (Academic Press, New York, 1982). Preferred 
ligases include nucleic acid ligases, e.g., T4 DNA ligase, T7 DNA ligase, E. coli DNA 
ligase, Taq ligase, Pfu ligase and Tth ligase. Protocols for their use are well known, e.g., 
Sambrook et al. Molecular Cloning: A Laboratory Manual, 2nd Edition (cold Spring 
Harbor Laboratory, New York, 1989); Barany, PCR Methods and Applications 1 :5-16 

10 (1991). Generally, ligases require that a 5' phosphate group be present for ligation to 
the 3* hydroxyl of an abutting strand. This is conveniently provided for at least one 
strand of the nucleic acid segment by selecting a restriction endonuclease which leaves a 
5' phosphate, e.g., a Fokl restriction endonuclease. For example, T4 DNA ligase is 
highly specific in its ability to ligate the 3' end of one oligonucleotide to the 

1 5 phosphorylated 5' end of another oligonucleotide using a DNA template, because a 

mismatch between the oligonucleotide substrates at the ligation junction greatly reduces 
the ligation efficiency (Alves AM, FJ Carr, Nucleic Acids Res 1988; 16:8723, Wu DY, 
RB Wallace Gene 1989; 76:245-254, Somers VAMC, PTM, Moekerk, JJ Murtagh, Jr., 
and FBJM Thunnissen, Nucleic Acids Research 1994; 22:4840-4841, and Samiotaki M, 

20 M Kwiatkowski, J Parik and U Landegren, Genomics 1 994; 20:238-242). This permits 
highly selective ligation of an oligonucleotide whose end nucleotide is complementary 
to the template at the ligation junction, allowing template-directed DNA ligation to 
discriminate between single nucleotides in a designated position of the DNA template. 
This forms the basis for point mutation discrimination by the ligase chain reaction using 

25 either T 4 DNA ligase (Wu DY, RB Wallace, Genomics 1989; 4:560-569) or a heat- 
stable DNA ligase (Barany F. Proc Natl Acad Sci USA 1991; 88:189-193). E. coli DNA 
ligase can also discriminate between mismatches at a ligation junction (Kato K, Nucleic 
Acids Research 1996; 24:394-395), and other DNA ligases can be anticipated to share 
this characteristic. The ligase chain reaction, and related earlier methods for nucleotide 

30 discrimination using a DNA ligase, detect point mutations at a single position. Each 
position assessed requires a unique set of annealing oligonucleotides, so that a method 
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based solely on DNA ligation steps can only provide very limited sequence information. 
Attachment of an adaptor sequence to the complement of a DNA template can occur 
through primer extension, and this attachment of a sequence to a DNA segment is 
considered a ligation. This can occur, for example, during PCR amplification (Mullis, 

5 K., Faloona, F., Scharf, S., Saika, R., Horn, G., Erlich, H., Cold Spring Harbor Symposia 
on Quantitative Biology, Cold Spring Harbor Laboratory, LI:263-273 (1986)). Such 
attachment through polymerase extension has been referred to as a ligation of the primer 
sequence to the polymerase product by other investigators (Brenner, S., International 
Publication Number WO/12039, page 18, lines 35-38 (1996), and such attachment of a 

1 0 primer sequence can occur using a short overhang as a template during PCR 

amplification (Upcroft, P., Healy, A., Nucleic Acids Research 21: 1854 (1993) or during 
a single primer extension (Fu, D.J., Broude, N.E., Koster, H., Smith, C.L., Cantor, C.R., 
Proc. Natl Acad. Scl USA, 92:10162-10166 (1995)). 

In another embodiment of the invention, template-directed polymerization is 

1 5 used instead of template-directed ligation described above. For example, double 

stranded molecule having a single stranded overhang sequence generated following Fokl 
digestion can be sequenced by template-directed polymerization in the presence of four 
deoxynucleotide terminators (e.g. ddNTPs), each tagged with a distinct fluorescent label. 
Following polymerization and washing, which removes unincorporated terminators, 

20 identification of the incorporated terminator can be accomplished by fluorometry, 
revealing the sequence of nucleotide n in the nucleic acid segment. 

After adaptor ligation, an enzyme recognizing the adaptor via the enzyme 
recognition domain digests the ligated molecule at the site one or more nucleotides from 
a ligation site along the nucleic acid segment leaving a double stranded molecule having 

25 a single strand overhang sequence corresponding to the cut cite capable of participating 
in the next cycle of ligation and digestion. 

As used herein, the term "amplify" refers to an in vitro method which can be 
used to generate multiple copies of a nucleic acid, e.g., a DNA duplex or single-stranded 
DNA molecule, its complement, or both. Amplification techniques, therefore, include 

30 both cloning techniques, as well as PCR based amplification techniques. Preferably, the 
nucleic acid amplification is linear or exponential, e.g., PCR amplification or strand 
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displacement amplification. These techniques are well known to those of skill in the art. 
Amplification products are compositions which include a greater number of properly 
ligated molecules than the number of original nucleic acid segments. 

The term "primer" refers to a linear oligonucleotide which specifically anneals to 

5 a unique polynucleotide sequence and allows for amplification of that unique 

polynucleotide sequence. In one embodiment of the invention, the primer specifically 
anneals to the unique sequence in a cycle identification tag and allows for amplification 
of a ligated molecule. The primer is of a length which allows it to perform its intended 
function. Examples of lengths include: from about 8 to about 60 nucleotides in length; 

10 from about 8 to about 30 or 40 nucleotides in length; and from about 8 to about 1 5 or 20 
nucleotides in length. In one embodiment of the invention, a primer is said to encode a 
restriction endonuclease recognition domain if it contains a portion of that recognition 
domain, when the primer undergoes primer extension to generate a complete strand of 
that recognition domain. 

15 A strategy can be implemented to remove one of the amplifying primers, and its 

complement, from each product of amplification, e.g., PCR amplification, thus, 
preventing the sequencing of DNA encoded by this primer. 

Selective removal of primer encoded sequence from a PCR product can be 
accomplished by restriction endonuclease digestion, without cutting internal recognition 

20 domains, using the method of Padgett and Sorge (Padgett KA, JA Sorge, Gene 1 996; 
168:31-35), as described herein. Alternatively, a primer can encode the recognition 
domain for a restriction endonuclease that requires a methylated nucleotide for cleavage, 
and recognizes a hemi-methylated recognition domain (see Example 4). Using this 
strategy, only the primer directed end is cut by the restriction endonuclease because only 

25 the primer encoded recognition domain is methylated. Therefore, this strategy does not 
require substitution of a free methylated nucleotide for the corresponding non- 
methylated nucleotide in the PCR mixture, or the recognition domain to contain less that 
all four nucleotides in a given strand, distinguishing it from the method of Padgett and 
Sorge. 
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Technology for removing primer encoded sequence from PCR products can also 
be used to facilitate the generation of initial nucleic acid segments from clone libraries. 
For example, the restriction endonuclease recognition domain can be incorporated into 
the vector adjacent to or within several basepairs of each vector insert, as already 

5 described so that following PCR amplification, restriction endonuclease digestion is 
used to remove primer encoded sequence, prior to ligation of initial adaptors (containing 
offset recognition domains for the class-IIS restriction endonuclease recognition domain 
used for sequencing). This will facilitate sequencing of clone libraries because 
sequencing cycles will not be wasted sequencing the removed primer encoded end of 

1 0 PCR amplified vector inserts. Once a class-IIS recognition domain is discovered that 
requires a methylated nucleotide and recognizes a hemi-methylated recognition domain, 
the strategy of using a methylated primer to hemi-methylate the recognition domain in 
only that primer encoded end of a PCR product will be the predominant method for 
removing an entire primer sequence from PCR products in those applications for which 

1 5 current class-IIS restriction endonucleases are used, including for the generation of site- 
directed mutants and recombinant constructs. (Beck R, H Burtscher, Nucleic Acids 
Research 1994; 22:886-887; Stemmer WPC, SK Morris, BS Wilson, BioTechniques 
1993; 14:256-265; Stemmer WPC, SK Morris, CR Kautzer, BS Wilson, Gene 1993; 
123:1-7; Tomic M, I Sunjevaric, ES Savtchenko, M Blumenberg, Nucleic Acids 

20 Research 1990; 18:1656.) 

Removal of the amplifying primer can also be accomplished by incorporating a 
dUTP at the 3' end of this amplifying primer. dUTP is a nucleotide analog that is readily 
available and can be incorporated into a primer sequence at or near its 3 ! end during 
oligonucleotide synthesis. dUPT can prime from the extreme 3* end of a primer even 

25 when mismatched (Kwok S, S-Y Chang, JJ Sninsky A Wang, PCR Methods and 
Applications 1994; 3:S39-S47). Uracil DNA Glycosylase is used to cleave the N- 
glycosylic bond between the deoxyribose moiety and uracil, resulting in an abasic site 
(Varshney U, T Hutcheon, JH van de Sande, J Biol Chem 1988; 263:7776-7784). 
Subsequent heating hydrolyzes the DNA strand at this site, generating a phosphorylated 

30 5' end at the nucleotide located immediately 3' to the dUMP in the original primer, and 
this phosphorylated 5 ! end can undergo DNA ligation (Day PJR, MR Walker, Nucleic 
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Acids Res 1 99 1 ; 1 9:6959, Liu HS, HC Tzeng, YJ Liang, and Cc Chen, Nucleic Acids Res 
1 994; 22:401 6-401 7). Heating to hydrolyze the primer at the abasic site also removes 
nucleotides located 5' to the dUMP in the original primer, resulting in a 5' 
phosphorylated end with a 3' overhang sequence. 
5 An alternative method for removing the primer uses a primer with a 3' terminal 

ribose residue. A 3' terminal ribose residue is incorporated into the primer using the 
RNA residue as the solid support during standard phosphoramidite synthesis, and the 3' 
terminal ribose does not interfere with PCR amplification (Walder RY, JR Hayes, JA 
Walder, Nucleic Acids Res 1993; 21 :4339-4343, Silveira MH, and LE Orgel, Nucleic 
10 Acids Res 1995; 23:1083-1084). Following PCR amplification, a ribose linkage is 
created in the PCR product that can be readily cleaved by alkaline treatment or by 
digestion with RNase A for 3-terminal ribose residues that are C or U. Cleavage of the 
ribose linkage results in a 3' overhang sequence. 

Using either method for primer removal, generation of a blunt end suitable for 
1 5 ligation to an adaptor can then be accomplished by incubating with a single-strand 

specific exonuclease (e.g. Mung bean exonuclease), or with a DNA polymerase with a 3' 
exonuclease activity (e.g. T4 DNA Polymerase) in the presence of the four dNTPs 
(Stoker AW, Nucleic Acids Res 1990; 18:4290), permitting the removal of a primer 
sequence and its complement from PCR products prior to sequencing. Following 
20 adaptor ligation, a subsequent PCR step can use the ligated adaptor to generate a primer 
annealing site, so that only successfully ligated products are regenerated. Using any of 
the above strategies, with or without removal of one of the initial primers and its 
complement, initial template precursors can be generated. 

As is described more fully below, in the course of such cycles of ligation and 
25 digestion preferably the first or farthest unpaired nucleotide in the overhang sequence of 
the double stranded nucleic acid segment is identified. For example, this nucleotide can 
be identified using an adaptor with a detectable label As used herein, the term 
"detectable label" refers to a material that can attach to a DNA molecule and generate a 
signal. The adaptors may be labeled by a variety of means and at variety of locations. 
30 The adaptors of the invention can be labeled by methods known in the art, including the 
direct or indirect attachment of radioactive labels, fluorescent labels, colorimetric labels, 
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chemilluminescent labels and the like, as described in Matthews et al., Anal Biochem., 
Vol. 169, pgs. 1-25 (1988); Haugland, Handbook of Fluorescent Probes and Research 
Chemicals (Molecular Probes, Inc., Eugene, 1992); Keller and Manak, DNA Probes, 2nd 
Edition (Stockton Press, New York, 1993); and Eckstein, editor, Oligonucleotides and 
5 Analogues: A Practical Approach (IRL Press, Oxford, 1991); Wetmur, Critical Reviews 
in Biochemistry and Molecular Biology, 26:227-259 (1991); and the like. Many more 
particular methodologies applicable to the invention are disclosed in the following 
sample of references: Connolly, Nucleic Acids Research, Vol. 15, pgs. 3131-3139 
(1987); Gibson et al., Nucleic Acids Research, Vol. 15, pgs. 6455-6467 (1987); Spoat et 
10 aL, Nucleic Acids Research, Vol. 15, pgs. 4837-4848 (1987); Fung et al., U.S. Patent No. 
4,757,141; Hobbs, Jr. et al., U.S. Patent No. 5,151,507; Cruickshank, U.S. Patent No. 
5,091,519; (synthesis of functionalized oligonucleotides for attachment of reporter 
groups); Jablonski et al., Nucleic Acid Research, 14:61 15-6128 (1986) (enzyme- 
oligonucleotide conjugates); and Urdea et al., U.S. Patent No. 5,124,246 (branched 
1 5 DNA). Preferably, the adaptors are labeled with one or more fluorescent dyes, e.g., as 
described in US Patent No. 5,188,934 and PCT application PCT/US90/05565. In a 
preferred embodiment of the invention, the adaptor is attached to a solid matrix, such as 
a magnetic particle, e.g., magnetic streptavidin or magnetic glass particle, polymeric 
microsphere, filter material, or the like. Incorporation of label and sequencing can also 
20 occur following adaptor ligation, using primer-directed incorporation of a label. In this 
case, labeled primers have 3' ends that discriminate between nucleotides at the position 
of interest. This approach, called competitive oligonucleotide priming, has been used to 
identify mutations using PCR (Gibbs, R.A., Nguyen, P.N., and Caskey, C.T., Nucleic 
Acids Research 17: 2437-2448 (1989)). 
25 Figures 1 , 2, 3 and 4 illustrate four embodiments of the present invention. Figure 

1 illustrates the use of a class-IIS restriction endonuclease that generates a 5' overhang, 
and sequences a nucleotide at each interval by template-directed ligation. In Figure 1 , 
this embodiment is illustrated using the class-IIS restriction endonuclease Fokl, and the 
template precursor has a biotinylated end that allows it to be bound to streptavidin. In 
30 Step 1 , the template precursor is cleaved with FoKL. Fok I has the following recognition 
domain and cut site: 
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5' GGATG (N)9 
3' CCTAC (N)i 3 

Fokl generates a four nucleotide long 5' overhang positioned nine nucleotides away from 
one side of the recognition domain, so that sequencing can be carried out in intervals of 
5 nine nucleotides. Fok I digestion cleaves both strands of the double-stranded DNA, 
generating a DNA template with a 5' overhang sequence. The bound template is washed 
to remove the cleaved ends. In Step 2 the 5' overhang sequence mediates ligation to one 
of four adaptors. These adaptors contain the sequence for the recognition domain for 
Fok I and have an adjacent four nucleotide long and phosphorylated 5' overhang 
1 0 consisting of three nucleotides with 4-fold degeneracy and a 5* terminus with one of the 
four normal nucleotides. Since the four adaptors each have three degenerate nucleotides 
and four distinct 5 ! terminal nucleotides, there are 256 distinct sequences. The adaptors 
shown are double-stranded, because this increases the ligation efficiency, probably due 
to stacking interactions (Lin S-B, KR Blake, PS Miller, Biochemistry 1989; 28:1054- 
15 1 06 1 ). In this embodiment of the method there is one ligation reaction during each 
sequencing cycle. In each ligation, all four adaptors are present, and each adaptor is 
preferably tagged with a distinct fluorescent label (e.g. Fama-NHS ester, Rox-NHS ester, 
Tamra-NHS ester, or Joe-NHS ester; Applied Biosy stems Division of Perkin-Elmer, 
Foster City CA); each label identifying the nucleotide at the single-stranded 5' end of the 
20 adaptor. Ligation occurs for the adaptor for which the above mentioned 5' nucleotide is 
complementary to the nucleotide on the 5* end of the DNA template at the ligation 
junction. Following ligation, and washing to remove the unligated adaptors, 
identification of the ligated adaptor can be accomplished by fluorometry, revealing the 
sequence of the DNA template at the ligation junction (Step 3). In step 4, the ligated 
25 template from Step 2 undergoes PCR amplification using a biotinylated primer and 
using a primer that is complementary to a unique portion of the adaptor's ligated lower 
strand. An alternative approach would sequence via ligation of the adaptor's upper 
strand. In this approach, the fixed nucleotide in the single strand extension in each 
adaptor is the fourth nucleotide 3' to the 5' end. The label is preferably in the upper 
30 strand, and this label identifies the lower strand's fixed nucleotide in the single strand 
overhang, with the remaining nucleotides in this single strand being promiscuous 
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nucleotides (degenerate or universal nucleotides). In this embodiment of the invention, 
one of the primers would be homologous to a unique portion of the adaptor's ligated 
upper strand. 

This unique region, and its corresponding amplification primer, may differ 
5 during every sequencing cycle, or during every several sequencing cycles. By using 
ligated adaptors and corresponding amplifying primers that differ in each cycle, uncut 
products from Step 1 are not amplified, preventing uncut products from generating 
background signal in subsequent cycles. The PGR product is bound to streptavidin, and 
the entire process is repeated, sequencing a nucleotide nine nucleotides within the 
1 0 original nucleic acid segment during each cycle of cutting, template-directed ligation, 
and amplification of the desired template precursor. During Step 1 of the subsequent 
cycle digestion with Fokl cleaves both strands of the DNA and generates a new 5' 
overhang sequence with each strand shortened by nine nucleotides when compared to the 
template at the end of the prior Step 1 . (This shortening of the template precursor 
1 5 following each cycle is not shown in Figures 1 - 4). 

Additional steps can be taken to increase the efficiency of each step, and may 
prove necessary in implementing a protocol that does not use amplification to regenerate 
the template precursor during each cycle. These additional steps include: 

1) Treating the template with alkaline phosphatase following or during restriction 
20 endonuclease cutting (Step 1 of Figure 1). This de-phosphorylates the 5 ! end of each 

template, preventing ligation of one template to another. 

2) Using adaptors with upper strand 3' ends that are blocked by a 3' phosphate or 
blocked by a 3' dideoxy nucleotide. This prevents ligation of one adaptor to another 
during Step 2 of the method of Figure 1 . 

25 3) Incubating with a DNA polymerase and the four ddNPTs following the 

adaptor ligation step (Step 2 in Figure 1). This fills in the recessed 3 f end of those 
templates that escaped adaptor ligation, and caps these ends so that they cannot undergo 
ligation (Atkinson MR, MP Deutscher, A Kornberg, A F Russell, JG Moffatt, Enzymatic 
Synthesis of DNA 1969; 8:4897-4904). This additional step prevents templates that 

30 failed to undergo adaptor ligation during a given cycle from undergoing adaptor ligation 
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in subsequent cycles, thus eliminating background signal resulting from incomplete 
ligation of templates. 

4) Retained fluorescent label resulting from incomplete cutting by Fok I can be 
quenched by photo-bleaching immediately prior to Step 1, or through cleavage of the 

5 label by using a labile linkage (Dawson BA, T Herman, J Lough Journal of Biological 
Chemistry 1989; 264:12830-12837, Olejnik J. E Krzymanska-Olejnik, KJ Rothschild 
Nucleic Acids Research 1996; 24:361-366) thus decreasing background fluorescent 
signal from previous cycles. 

If the lower strand of the adaptor is ligated, the upper strand's 3* end can be 

10 blocked, non blocked and added later, or de-blocked (via dephosphorylating a 3' 
phosphate, Cameron V, OC Uhlenbeck Biochemistry 1977; 16:5120-5126 or, for 
example, by the method described in Metzker ML, Raghavachari R, Richards S, Jacutin 
SE, Civitello A, Burgess K and RA Gibbs, Nucleic Acids Res. 1994;22:4259-4267 and 
Canard B and RS Sarfati, Gene 1994;148:1-6) . Also an intact double-stranded segment 

1 5 can be generated, without nicks, using a DNA polymerase with strand displacement 
activity or with a 5' exonuclease activity, in a nick translation reaction (Rigby PWJ, M 
Dieckmann, C Rhodes, P Berg Mol Biol 1977; 1 13:237-251). Such strand 
displacement or nick translation could occur with concurrent hemi-methylation of 
internal recognition domain for the class-IIS restriction endonuclease using the primer 

20 extension strategy of Han and Rutter (Han J, Rutter WJ, Nucleic Acids Res 1 988; 
16:11837). 

If the upper strand of the adaptor is ligated, an intact double-stranded segment 
could be generated, without nicks, by using a DNA polymerase to generate the 
complement to the adaptor's ligated upper strand. This polymerization could occur with 

25 concurrent hemi-methylation of the adaptor encoded recognition domain for the class- 
IIS restriction endonuclease using the polymerase extension in the presence of a 
methylated nucleotide (when sequencing with a class-IIS restriction endonuclease that 
recognizes a hemi-methylated recognition domain; also, if the ligated upper-strand's 
recognition domain sequence were methylated, both strands of the recognition domain 

30 would be methylated using this method). If the adaptor were double-stranded, the 

unligated lower strand of the adaptor could be digested by nick translation using a DNA 
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polymerase with 5* exonuclease activity, or by using a DNA polymerase with strand 

displacement activity. 

Figure 2 illustrates a second embodiment of the sequencing method of this 
invention wherein a class-IIS restriction endonuclease generates a 3' overhang, and 
5 sequences a nucleotide at each interval by template-directed ligation. In Figure 2, this 
embodiment is illustrated using the class-IIS restriction endonuclease BseKL, and the 
template precursor has a biotinylated end that allows it to be bound to streptavidin. In 
Step 1 , the template precursor is cleaved with BseRl. BseKL has the following 
recognition domain and cut site: 

10 5' GAGGAG (N)io 

T CTCCTC (N) 8 

BseKL generates a two nucleotide long 3' overhang positioned eight nucleotides away 
from one side of the recognition domain, so that sequencing can be carried out in 
intervals of eight nucleotides. BseKL digestion cleaves both strands of the double- 

1 5 stranded DNA, generating a DNA template with a 3' overhang sequence. The bound 
template is washed to remove the cleaved ends. In Step 2 the DNA template (3' 
overhang sequence) undergoes ligation in the presence of four adaptors. These adaptors 
contain the sequence for the recognition domain for BseKL and have an adjacent two 
nucleotide long 3' overhang consisting of one nucleotide with 4-fold degeneracy and a 3' 

20 terminus with one of the four normal nucleotides. Since the four adaptors each have one 
degenerate nucleotide and four distinct 3' terminal nucleotides, there are 16 distinct 
sequences. The adaptors shown are double-stranded, because this increases the ligation 
efficiency. There is one ligation reaction during each sequencing cycle. In each ligation, 
all four adaptors are present, and each adaptor is preferably tagged with a distinct 

25 fluorescent label; each label identifies the single-stranded nucleotide at the single- 
stranded 3 f end of the adaptor. Ligation of the upper strand of the adaptor occurs if the 
above mentioned 3 1 nucleotide is complementary to the nucleotide on the 3 ! end of the 
DNA template at the ligation junction. Following ligation and washing to remove the 
unligated adaptors, identification of the ligated adaptor can be accomplished by 

30 fluorometry, revealing the sequence of the DNA template at the ligation junction (Step 
3). In step 4, the ligated template from Step 2 undergoes PCR amplification using a 
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biotinylated primer and using a primer that is homologous to a unique portion of the 
adaptor's ligated upper strand. If the lower strand underwent the ligation reaction that 
sequenced the DNA, by using an upper strand that had its fixed nucleotide in its 3 f single 
stranded portion of the adaptor immediately adjacent to the double-stranded portion of 
5 the adaptor, the non-biotinylated primer would be complementary to a unique portion in 
the ligated adaptor's lower strand. This unique region, and its corresponding 
amplification primer, may differ during every sequencing cycle, or during every several 
sequencing cycles, preventing uncut products from a prior cycle from generating 
background signal in subsequent cycles. The PCR product is bound to streptavidin, and 
10 the entire process is repeated, sequencing a nucleotide eight nucleotides further within 
the original nucleic acid segment during each cycle of cutting, template-directed ligation, 
and in vitro amplification of the desired template precursor. During Step 1 of each 
subsequent cycle, digestion with BseKL cleaves both strands of the DNA and generates a 
new 3' overhang sequence with each strand shortened by eight nucleotides when 
1 5 compared to the template at the end of the prior Step 1 . 

Another step can be taken to prevent templates that do not undergo ligation 
during a given cycle from undergoing ligation in a subsequent cycle. Following adaptor 
ligation (Step 2 of Figure 2) incubation with alkaline phosphatase will dephosphorylate 
the 5' end of those templates that did not undergo ligation to an adaptor, preventing these 
20 templates from undergoing adaptor ligation in subsequent cycles. If amplification (Step 
4 of Figure 2) is not used, following ligation of the adaptor's upper strand (Step 2 of 
Figure 2), the lower strand of the DNA being sequenced can prime template-directed 
polymerase extension using a DNA polymerase with a 3' exonuclease activity, in the 
presence of the four dNTPs recognizing that the DNA polymerase preferably has a 5' 
25 exonuclease activity or a strand displacement activity if the adaptor has a lower strand. 
This will re-synthesize the lower strand of the attached adaptor, eliminating the nick and 
any mismatches while generating a template precursor. Also, those templates which did 
not undergo adaptor ligation will be rendered blunt ended by the 3' exonuclease activity 
of the DNA polymerase preventing adaptor ligation in subsequent cycles. When using a 
30 restriction endonuclease that generates a 3 f overhang, a terminal transferase can be used 
to add a single dideoxy nucleotide to the end of the template. This terminal nucleotide 
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can act as a barb in a hook to help hold the adaptor in place, as each adaptor can share a 
nucleotide complementary to the dideoxy nucleotide in each adaptor's annealing strand, 
so that this will increase the efficiency of adaptor ligation. In this case, sequencing 
occurs in an interval that is one nucleotide shorter than the distance between the 

5 recognition domain and the cleavage domain. 

When a DNA polymerase is used to generate the complement to the adaptor's 
ligated upper strand, this polymerization may be performed with concurrent hemi- 
methylation of the adaptor encoded recognition domain for the class-IIS endonuclease 
using the polymerase extension in the presence of a methylated nucleotide (when 

10 sequencing with a class-IIS restriction endonuclease that recognizes a hemi-methylated 
recognition domain; also, if the ligated upper-strand's recognition domain sequence were 
methylated, both strands of the recognition domain would be methylated using this 
method). If the adaptor were double-stranded, the unligated lower strand of the adaptor 
could be digested by nick translation using a DNA polymerase with 5* exonuclease 

1 5 activity, or by using a DNA polymerase with strand displacement activity. 

If the lower strand of the adaptor is ligated, an intact double-stranded segment 
could be generated, without nicks, by using a DNA polymerase with a 5 ! exonuclease 
activity, in a nick translation reaction or strand displacement reaction (Rigby, PWJ, M 
Dieckmann, C Rhodes, P Berg Mol Biol 1977; 1 13:237-251) using the upper strand of 

20 the adaptor as a primer. Such nick translation or strand displacement could occur with 
concurrent hemi-methylation of internal recognition domain for the class-IIS restriction 
endonuclease using the primer extension strategy of Han and Rutter (Han J, Rutter WJ 
Nucleic Acids Res 1988; 16:11837). 

Figure 3 shares with Figure 1 the use of a class-IIS restriction endonuclease that 

25 generates a 5' overhang, but sequences a nucleotide at each interval by template-directed 
polymerization instead of template-directed ligation. In Step 2 of Figure 3, the DNA 
template generated following Fokl digestion is sequenced by template-directed 
polymerization in the presence of four deoxynucleotide terminators (e.g. ddNTPs), each 
tagged with a distinct fluorescent label. Following polymerization and washing, which 

30 removes unincorporated terminators, identification of the incorporated terminator can be 
accomplished by fluorometry, revealing the sequence of one nucleotide in the DNA 
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template, as shown in Step 3. Step 4 illustrates the ligation of an adaptor containing the 
sequence for the recognition domain for Fok I and an adjacent three nucleotide long 5' 
overhang consisting of three nucleotides with 4-fold degeneracy. The ligation illustrated 
in Figure 3 is template-directed but is not used to discriminate between nucleotides at the 
5 ligation junction. Since the single adaptor has three degenerate nucleotides, there are 64 
distinct sequences. The adaptors shown are double-stranded, as this increases the 
ligation efficiency. The amplification shown in Step 5 of Figure 3 corresponds to Step 4 
of Figure 1, except that the amplifying primer is homologous to the ligated strand of the 
adaptor, which is the upper strand in Figure 3. 
10 Since the upper strand of the adaptor undergoes ligation, an intact double- 

stranded segment could be generated, without nicks, by using a DNA polymerase to 
generate the complement to the adaptor's ligated upper strand. The lower strand of the 
DNA segment being sequenced can de-blocked (via dephosphorylating a 3' phosphate, 
or by the method described in Metzker ML, Raghavachari R, Richards S, Jacutin SE, 
1 5 Civitello A, Burgess K and RA Gibbs, Nucleic Acids Res. 1 994;22:4259-4267 and 
Canard B and RS Sarfati, Gene 1994;148:1-6), allowing it to act as a primer. This 
polymerization could occur with concurrent hemi-methylation of the adaptor encoded 
recognition domain for the class-IIS endonuclease using the polymerase extension in the 
presence of a methylated nucleotide (when sequencing with a class-IIS restriction 
20 endonuclease that recognizes a hemi-methylated recognition domain; also, if the ligated 
upper-strand's recognition domain sequence were methylated, both strands of the 
recognition domain would be methylated using this method). 

In the strategy illustrated in Figure 3, if the class II-S restriction endonuclease 
generates a single nucleotide 5' end extension, template-directed polymerization will 
25 generate a blunt end, so that adaptor ligation is blunt ended, as opposed to the template- 
directed ligation illustrated in Figure 3. Furthermore, if a class-IIS restriction 
endonuclease is discovered that generates a blunt end, or a blunt end is generated using a 
single strand exonuclease, a nucleotide at this end could be sequenced by template- 
directed polymerization through a nucleotide exchange reaction, in which the 3 1 
30 exonuclease activity of a DNA polymerase is used to generate a recessed 3 1 end that can 
undergo template-directed polymerization, incorporating a labeled nucleotide and once 
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again generating a blunt end that would undergo ligation to the adaptor (Atkinson MR, 
MP Deutscher, A Kornberg, A F Russell, JG Moffatt Enzymatic Synthesis ofDNA 1969; 
8:4897-4904, Englund PT Journal of Biological Chemistry 1971 ; 246:3269-3276). In 
this case, the template is formed fleetingly, through the 3 1 exonuclease activity of a DNA 
5 polymerase during the exchange reaction that constitutes the DNA sequencing step. If 
the incorporated labeled terminator inhibits adaptor ligation, only a fraction of a given 
terminator needs to carry a label, and only a fraction of a given template needs to 
undergo labeling, because only a fraction of a template must undergo adaptor ligation to 
allow regeneration of the desired template precursors by DNA amplification in vitro. 
10 This illustrates how product regeneration allows separation of the template generation 
and template sequencing elements of this method without physical separation of these 
elements into separate aliquots. 

Figure 4 illustrates a variation of the method of Figure 3 in which the overhang 
appended to the adaptor-encoded sequence is attached to a solid phase. In this variation, 
1 5 the PGR primer that varies between cycles carries the biotin moiety. Following Fokl 
cutting, the end encoded by the adaptor is attached to the solid matrix, and a nucleotide 
in this end is sequenced by template-directed polymerization. In addition, this end could 
be sequenced by template-directed ligation, in which case the class-IIS restriction 
endonuclease could generate a 5' overhang or a 3' overhang. Another variation that 
20 could be carried out would be to combine sequencing by template-directed 

polymerization with sequencing by template-directed ligation. For example, if the 
adaptor undergoing template-directed ligation in Step 4 of Figure 4 were a sequencing 
adaptor, as shown in Figure 1, sequencing could be accomplished by template-directed 
ligation and template-directed polymerization during each cycle using the same template 
25 precursor. Also, it is clear that the process of sequencing each template can be separated 
from the process of generating each template, so that a Fokl generated four nucleotide 
overhang could be sequenced, for example, by template-directed ligation and in a 
separation reaction by fill-in with labeled ddNTPs. 

Variants of protocols shown in Figures 1-4 not requiring the exponential 
30 amplification step (Step 4 of Figures 1 and 2 and Step 5 of Figures 3 and 4) can be 
developed using steps that optimize completion of each step and that "cap" incomplete 
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reactions, as described previously in conjunction with striding. For example Mmel has a 
recognition domain that is separated from its cleavage domain by 1 8 bp. Therefore, one 
could sequence over a span of 90 nucleotides over five iterative cycles, as opposed to 
only 5 nucleotides when using a method that sequences consecutive nucleotides. Other 
5 measures that may increase the number of sequencing cycles that can be carried out 
without using exponential in vitro amplification, include: 

1) Modification of a restriction endonuclease recognition domain by use of a 
base analog to improve binding to the restriction enzyme, so that a modified double- 
stranded oligonucleotide binds to its restriction endonuclease more effectively than the 

1 0 naturally occurring recognition domain (Lesser DR, MR Kurpiewski, T Waters, B A 
Connolly, and L Jen-Jacobson, Natl. Acad Sci. USA 1993; 90:7548-7552). Using a 
ligated adaptor with a modified class-IIS recognition domain may improve restriction 
endonuclease binding and cutting efficiency. For example, a hybrid restriction 
endonuclease could be generated in which a protein that recognizes a certain DNA 

1 5 sequence or moiety is attached to the cleaving domain of a class-IIS restriction 

endonuclease, generating a new specificity with a defined distance between a cleavage 
domain and a recognition domain (Kim Y-G, J Cha, S Chandrasegaran, Proc. Natl. 
Acad Sci. USA. 1996; 93: 1156-1160). 

2) Ligating adaptors that are covalently attached to a class-IIS restriction 

20 endonuclease. A variety of enzymes have been covalently attached to oligonucleotides 
(Jablonski E, EW Moomaw, RH Tullis, JL Rith, Nucleic Acids Res 1986; 14:61 15-6128, 
Li P, PP Medon, DC Skingler, JA Lanser, RH Symons, Nucleic Acids Res 1987; 
15:5275-5287, Ghosh SS, PM Kao, DY Kwoh, Anal Biochem 1989;78;178:43-51). Use 
of a double-stranded recognition domain with the class-IIS restriction endonuclease 

25 attached to it could be used to target cutting to the cleavage domain adjacent to the 
ligated adaptor's recognition domain, so long as buffer conditions during the prior 
ligation do not permit cutting. Since the restriction endonuclease would only be 
positioned immediately adjacent to the desired recognition site, digestion would not be 
mediated by internal recognition domains, so that methylation of internal recognition 

30 domains would not be necessary. 
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3) Using a class-IIS restriction endonuclease that requires a methylated 
recognition domain, and will recognize a hemi-methylated recognition domain. In this 
case, the recognition domain can be hemi-methylated during adaptor ligation using an 
adaptor strand that contains a methylated strand of this domain, so that only this 

5 recognition domain would be recognized. A class-IIS restriction endonuclease that 
requires a methylated recognition domain could be used in this method and would be 
advantageous, as it would obviate the need to block internal recognition domains for this 
class-IIS restriction endonuclease. 

Restriction endonucleases and DNA ligases have been used in this invention, but 

1 0 different enzymes or reactive chemicals could be used to generate the templates 

described in this invention. Mutated enzymes that carry out the same role can substitute 
for their naturally occurring counterparts (Kim JJ, KT Min, MH Kim, S J Augh, B-D 
Dim, D-S Lee Gene 1996; 171:129-130). Furthermore, various entities can substitute 
for DNA ligase and restriction endonucleases. Template-directed ligation has carried out 

15 through chemical condensation (Gryaznov SM, R Schultz, SK Chaturvedi, RL Letsinger, 
Nucleic Acids Research 1994 22:2366-2369, Dolinnaya NG, M Blumenfeld, IN 
Merenkova, TS Oretskaya, NF Krynetskaya, MG Ivanovskaya, M Vasseur and ZA 
Shabarova, Nucleic Acids Research 1993; 21 :5403-5407, Luebke KJ and PB Dervan, 
Nucleic Acids Research 1992; 20:3005-3009), and site-specific cleavage of DNA has 

20 been accomplished using oligonucleotides linked to reactive chemicals or non-specific 
nucleases (Lin S-B, KR Blake, PS Miller Biochemistry 1989;28:1054-1061, Strobel, SA, 
LA Doucette-Stamm, L Riba, DE Housman, PB Dervan, Science 1991; 254:1639-1642, 
Francois J-C, T Saison-Behmoaras, C Barbier, M Chassignol, NT Thuong, C Helene, 
Proc. Natl. Acad. Sci. USA 1989; 86:9702-9706, Pei D, DR Corey, PG Schultz, Proc. 

25 Natl Acad. Sci. USA 1990; 87:9858-9862). Non-protein enzymes have also been used 
to manipulate DNA, as ribozymes have mediated both the cleavage and ligation of DNA 
(Tsang J, GF Joyce, Biochemistry 1994; 19:5966-5973, Cuenoud B, JW Szostak, Nature 
1995;375:611-614). 

Nucleotide analogs have been used in a variety of functions, and template- 

30 directed ligation could be mediated by adaptors with single-stranded ends containing 
universal nucleotides or discriminatory nucleotide analogues (Loakes D, DM Brown, 
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Nucleic Acids Research 1994; 22:4039-4043, Nichols R, PC Andrews, P Zhang, DE 
Bergstrom, Nature 1994; 369:492-493). In addition, modified nucleotides other than 
methylated nucleotides have been found that block recognition by restriction 
endonucleases, and can be incorporated through primer-directed DNA synthesis (Huang 
5 L-H, CM Farnet, KC Ehrlich, M Ehrlich, Nucleic Acids Research 1982; 10:1579-1591, 
Seela F, W Herdering, A Kehne Helvetica ChimicaActa 1987; 70:1649-1660, and Seela 
F, A Roling, Nucleosides and Nucleotides 1991 ; 10:71 5-717). 

Technology now exists for the generation of a thousand distinct DNA segments 
at one time using the polymerase chain reaction (PCR), thus allowing the concurrent 
1 0 generation of a thousand DNA template precursors. Development of technology for 
template precursor generation is facilitated by present methods for the concurrent 
generation of multiple oligonucleotides, as oligonucleotides serve as primers for 
template precursor generation through DNA amplification in vitro (Caviana Pease A, 
Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SPA, Proc Natl Acad Sci USA 
1 5 1 994; 91 :5022-5026). Micro-chip based technology will allow the amplification of over 
10,000 distinct DNA segments, each containing several hundred base pairs of DNA 
(Shoffner MA, J Cheng, GE Hvichia, LJ Kricka, P Wilding, Nucleic Acids Research 
1996; 24:375-379, and J Cheng, Shoffner MA, GE Hvichia, LJ Kricka, P Wilding, 
Nucleic Acids Research 1 996; 24:380-385). This will allow a large portion of the human 
20 genome of an individual to be sorted on a biochip. Rapid technical progress in DNA 
sample generation creates a need for technology that can rapidly and accurately sequence 
arrayed samples of DNA in parallel. This invention addresses the need for technology 
that can sequence thousands of distinct DNA samples in parallel. 

Technology for generating double-stranded template-precursors via PCR, and for 
25 the fluorometric assessment of thousands of locations on a chip, will allow the 
sequencing of several thousand PCR products simultaneously using this invention, 
allowing large amounts of DNA to be sequenced using repetitive incubations in simple 
reagents. The template precursors can be bound to a silicon chip or contained in a 
matrix of chambers, so that cycles of adaptor ligation, template-directed DNA 
30 polymerization for amplification or sequencing, and cutting can be carried out on 
numerous templates in parallel. 
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Technology that has been developed for the simultaneous assessment of 
thousands of locations on a chip will facilitate the simultaneous sequencing of these 
templates. For example, a microchip has been designed for the quantitative detection of 
DNA labeled with fluorescent, chemiluminescent or radioactive reporter groups (Eggers 

5 M, M Hogan, RK Reich, J Lamture, D Ehrlich, M Hollis, B Kosicki, T Powdrill, K 
Beattie, S Smith, R Varma, R Gangadharan, A Mallik, B Burke and D Wallace, 
BioTechniques 1994; 17:516-524). This microchip consists of a charged coupled device 
(CCD) detector that quantitatively detects and images the distribution of labeled DNA 
near spatially addressable pixels. DNA has been deposited onto a silicon wafer with a 

10 micro-jet using DNA with an amine modified 5 f end, which is linked to the Si02 surface 
by secondary amine formation. This immobilized DNA is on an Si02 wafer overlying 
the pixels of the charged coupled device. A prototype 420 x 420 pixel device has been 
developed that can analyze 176,400 samples in parallel, enabling the detection of 
thousands of label incorporation events on a square centimeter chip (Eggers M, M 

1 5 Hogan, RK Reich, J Lamture, D Ehrlich, M Hollis, B Kosicki, T Powdrill, K Beattie, S 
Smith, R Varma, R Gangadharan, A Mallik, B Burke and D Wallace, BioTechniques 
1994; 17:516-524). 

Technology that will further enhance the utility of the present invention include 
hybridization based approaches for sorting genomic DNA (as opposed to sequencing by 

20 hybridization) into unique restriction fragments, which can then be amplified at their 
addresses using a single set of PCR primers (Chetverin AB, FR Kramer, BioTechnology 
1994; 12:1093-1099). In the future, it will be possible to apply the present invention to 
the sequencing of large portions of genomes for which there is no prior sequence 
information without cloning in vivo (e.g., in E. coli). New innovative hybridization 

25 based strategies have been proposed that use oligonucleotide arrays to sort restriction 
endonuclease generated fragments on the basis of their unique sequences. In one 
strategy, genomic DNA undergoes complete restriction endonuclease digestion. This is 
followed by ligation of the DNA ends to adaptors. These restriction fragments are sorted 
on a hybridization array of oligonucleotides through annealing to the adaptor sequence 

30 as well to unique adjacent sequences in the DNA fragments. This is followed by a 
ligation step that requires perfect complementarity of the unique sequence adjacent to 



WO 99/45153 PCT/US99/04883 

-48- 

the adaptor, resulting in sorting of the restriction fragments into unique addresses on the 
biochip. An additional step repeats this strategy using the opposite end of each 
fragment. These sorted fragments can then be PCR amplified in situ using a single set of 
primers that anneal to the adaptor sequences (Chetverin AB, FR Kramer, BioTechnology 

5 1994; 12:1093-1099). Integrating this hybridization-based technology into the present 
method will allow the sequencing of genomes using a single set of PCR primers without 
prior sequence information. 

An area of technology development that can also be useful to the application of 
the proposed method is oligonucleotide synthesis from the 5' to 3 1 direction (Coassin PJ, 

1 0 JB Rampal, RS Matson International Workshop on Sequencing by Hybridization 
(Woodlands, TX) 1993; Report 8). This will allow amplifying primers to be 
manufactured on a chip. These bound primers could be used to amplify PCR products, 
as it has recently been confirmed that a primer can mediate PCR amplification while 
bound to a solid immobile matrix (Kohsaka H, DA Carson, Journal of Clinical 

1 5 Laboratory Analysis 1 994; 8:452-455). 

Kits 

A variety of kits are provided for carrying out different embodiments of the 
invention. Generally, kits of the invention include adaptors tailored for the enzyme, e.g., 

20 a class IIS restriction endonuclease, and the detection scheme of the particular 
embodiment. Kits further include the enzyme reagents, the ligation reagents, PCR 
amplification reagents, and instructions for practicing the particular embodiment of the 
invention. In embodiments employing natural protein endonucleases and ligases, ligase 
buffers and endonuclease buffers may be included. In some cases, these buffers may be 

25 identical. Such kits may also include a methylase and its reaction buffer. Preferably, 
kits also include a solid phase support, e.g. magnetic beads, for anchoring target DNA 
segments. In one preferred kit, labeled ddNTP's are provided. In another preferred kit, 
fluorescently labeled probes are provided such that probes corresponding to different 
terminal nucleotides of probe or the target polynucleotide cany distinct spectrally 

30 resolvable fluorescent dyes. As used herein, "spectrally resolvable" means that the dyes 
may be distinguished on basis of their spectral characteristics, particularly fluorescence 
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emission wavelength, under conditions of operation. Thus, the identity of the one or 
more terminal nucleotides would be correlated to a distinct color, or perhaps ratio of 
intensifies at different wavelengths. More preferably, four such probes are provided that 
allow a one-to-one correspondence between each of four spectrally resolvable 
5 fluorescent dyes and the four possible terminal nucleotides on a target DNA segment. 
Sets of spectrally resolvable dyes are disclosed in U.S. Pat. No. 4,855,225 and 
5,1 88,934; International application PCT/US90/05565; and Lee et al., Nucleic Acids 
Research 20:2471-2483 (1992). 

10 Automation of Iterative and Regenerative DNA Sequencing 

The foregoing sequencing steps, being iterative, may be automated and applied in 
parallel to an arbitrary number of separate samples. Such automation permits the 
sequencing method to generate a large amount of sequence information, and this 
information is further enhanced by the subinterval or adjacency order existing between 

1 5 the products of successive steps, as well as in a multiplex scheme, the immobilized 
spatial locations in which sequencing occurs. 

Figure 8 shows a schematic outline of the overall architecture of a system 100 for 
automating sequencing according to the present invention, which is preferably 
implemented by a processing apparatus 20 which operates on support arrays 1 0 such as 

20 microtiter plates or specially fabricated chip arrays that consist of an array of wells, 

chambers or surface immobilization positions each capable of holding a DNA sample at 
a localized site. Device 20 performs four general types of operations in parallel on the 
DNA segments in the support array 10, and these are shown schematically as separate 
classes of processes arrayed in stations or functional groupings 30, 40, 50, 60 around the 

25 central device 20. 

As shown, the four basic processes involve the addition of reagents 30, washing, 
separating or preparation steps 40, reading the labeled segments at 50, or incubation and 
amplification steps at 60. These are schematically illustrated as four separate 
workstations through which the support array 1 0 is shuttled or moved, but are preferably 

30 implemented with varying degrees of integration into the basic array handler 20. Thus, 
for example, the array 1 0 may stay in position on a stage to which the necessary conduits 
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or manifolds are attached for addition of the reagents and washing of the samples, and 
which may be heated or cooled in cycles to incubate and amplify all materials on the 
support at once. Similarly, for reading, a charge couple device may be carried with 
appropriate optics by the device 20 to read the labeled material in each sample well 

5 between successive steps, or may be integrated into a cover plate or the structure of the 
sample support. In either case, each of these subunits or accessory portions of the 
system operates under control of a common controller 70 which coordinates the 
movement, heating, provision of reagents and reading of the various steps so that the 
readout of nucleotide labels by the reading section 50 is stored and recorded for the 

1 0 DN A samples at each location on the array 1 0. 

As noted above, each of the DNA segments which are to be analyzed, which 
may, for example, be PCR products or vector inserts, is immobilized so that it resides at 
a unique address on the chip or support 10, and several hundred to thousands of DNA 
segments are distributed on the chip. They simultaneously undergo a series of 

1 5 incubations that result in the accumulation of sequence information. A reagent may be 
delivered, for example, by a robotically carried comb or pipette array, or preferably by 
bulk or flow-through addition of the reagent. Separate reagents in their respective 
buffers are represented by the jar in the left hand portion of the diagram and these are 
passed to the support array 10 by automated control in the order for performing the 

20 sequencing chemistry described herein. Sequencing occurs either following template- 
directed adaptor ligation (as described for Embodiments 1 and 2 in relation to Figures 1 
and 2 herein) or following template-directed polymerization (as described in relation to 
Figures 3 and 4) or following PCR incorporation of a labeled primer through 
competitive oligonucleotide priming (Gibbs, R.A., Nguyen, P.N., and Caskey, C.T., 

25 Nucleic Acids Research, 1 7:2437-2448 (1989)). Simultaneous retrieval of sequence 
information from several thousand templates following template-directed incorporation 
of a label, is then done by reader 50. Reading can be accomplished concurrently using a 
charge coupled device, which is illustrated on the top of Figure 8, or may be performed 
in a slower scanning fashion by stepping the array past a line of scintillation or other 

30 detectors. By operating with a support array in which the DNA segments are 

immobilized in a small area and volume, a relatively strong signal is obtained free of the 
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spreading and cross-reading losses inherent in gel sequencing or migration-dependent 
methods. 

As described elsewhere herein, the method preferably includes a regeneration 
step. Illustratively, following the adaptor ligation step, an aliquot from each address 
5 undergoes PCR amplification in order to regenerate a template precursor for the next 
sequencing cycle. The appropriate primer sets and PCR mix are applied and the array 
undergoes a number of incubations. Preferably the device 20 has a heated stage with a 
Peltier cooler to accurately and quickly cycle the array through the required 
amplification regimen, or the array may pass to a separate processing chamber, e.g. an 
1 0 air oven thermal cycler of conventional type, for PCR amplification as illustrated on the 
bottom of the diagram. Following incubation with a reagent or PCR amplification, the 
DNA segments are frequently magnetically pelleted and washed to remove the reagent 
and any byproducts prior to a subsequent step. The magnet and wash buffer are 
illustrated by device processes or subassembly 40 on the right hand portion of Figure 8. 
1 5 Once the necessary set of adapters and primers for cutting and amplification sets 

have been determined, the process steps are straightforward, and well-defined nucleotide 
determinations are achieved with small amounts of sample. The support arrays may thus 
carry a large number of sites. A chip or group of chips with 90,000 defined addresses 
will for example, allow the amplification of 90,000 DNA segments using PCR. 
20 Simultaneous amplification of a large number of samples may be done with a robotic 
thermal cycler using the approach of Meier-Ewert S, E Maier, A Ahmadi, J Curtis, H 
Lehrach. An automated approach to generating expressed sequence catalogues. Nature 
1993; 361 : 375-376 and Drmanac S, R Drmanac. Processing ofcDNA and genomic 
kilobase-size clones for massive screening, mapping, and sequencing by hybridization. 
25 BioTechniques 1994; 17: 328-336, as applied to PCR. The invention also contemplates 
that the support be a microchip, in which case the teachings of PCR amplification on a 
microchip by several investigators are modified to include multiplex PCR amplification 
features for carrying out the methods described here. See, for example Wilding P, MA 
Shoffher, LJ Kricka. PCR in a silicon microstructure. Clinical Chemistry 1994; 40: 
30 1815-1818; Shouffoer MA, J Cheng, GE Hvichia, LJ Kricka, P Wilding. Chip PCR. I. 
Surface passivation of microfabricated silicon-glass chips for PCR. Nucleic Acid 
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Research 1996; 24: 375-379; Cheng J, MA Shoffher, GE Hvichia, LJ Kricka, P 
Wilding. Chip PCR II. Investigation of different PCR amplification systems in 
microfabricated silicon-glass chips. Nucleic Acid Research 1996; 24: 380-385; Burns 
MA, CH Mastrangelo, TS Sammarco, FP Man, JR Webster, BN Johnson, B Foerster, D 

5 Jones, Y Fields, AR Kaiser, DT Burke. Microfabricated structures for integrated DNA 
analysis. Proc. Natl. Acad. Sci. USA 1996; 93: 5556-5561. 

Automated sequencing is described below for a chip with 90,000 addresses using 
a protocol for Embodiment 1 . One of the primers in each PCR amplification is 
biotinylated, allowing these products to be bound to magnetic streptavidin. The opposite 

10 primer contains the recognition domain for Fok\ restriction endonuclease. HFokl is 
used as the restriction endonuclease, and sequencing is done in intervals of nine 
nucleotides, nine initial templates are generated for each of 10,000 DNA regions to be 
sequenced. This is accomplished by using primers with offset Fok\ restriction 
endonuclease recognition domains, as described extensively elsewhere herein. In the 

1 5 case where the DNA samples to be sequenced are vector inserts, primers are generated 
that anneal to the vector, so that only a few primers need to be synthesized to sequence 
the 90,000 DNA segments. 

Following PCR amplification, the DNA segments are bound to magnetic 
streptavidin and magnetically pelleted, washed, and incubated with Fokl in the 

20 corresponding buffer at 37°C, resulting in generation of the initial templates. After 
magnetic pelleting and washing, the 90,000 initial templates are incubated with a DNA 
ligase and the four sequencing adaptors, each with a unique label. Following magnetic 
pelleting and washing step to remove unligated adaptors, the ligated adaptor at each 
address is identified, for example with an automated reader using a charge coupled 

25 device. This is done in one embodiment by imaging the support array onto a CCD, and 
using automated analysis of the image pixels to threshold and read the luminescent 
labels, or by the approach described in Eggers M, M Hogan, RK Reich, J Lamture, D 
Ehrlich, M Hollis, B Kosicki, T Powdrill, K Beattie, S Smith, R Varma, R Gangadharan, 
A Mallik, B Burke, D Wallace. A microchip for quantitative detection of molecules 

30 utilizing luminescent and radioisotope reporter groups. BioTechniques 1 994; 17:516- 
525 or Lamture JB, KL Beattie, BE Burke, MD Eggers, DJ Ehrich, R Fowler, MA 
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Hollis, BB Kosicki, RK Reich, SRSmith, RS Varma, ME Hogan. Direct detection of 
nucleic acid hybridization on the surface of a charged coupled device. Nucleic Acid 
Research 1994; 22: 2121-2125. 

Following reading of the labels, new template-precursors are regenerated by PCR 
5 amplification, bound to magnetic streptavidin, magnetically pelleted, washed, and cut 

with Fofci, generating a new set of templates corresponding to the previous set of 

« 

templates but with each strand shortened by nine nucleotides at that end when compared 
to the prior corresponding template. 

PCR amplification is preferably carried in such a way as to limit "noise." This 

10 may be accomplished by amplifying only a small portion of each ligation mixture to 
prevent successive exponential PCR amplifications from generating an accumulation of 
products during successive sequencing cycles. Obtaining a small aliquot from each 
ligation mixture for PCR amplification is performed in an automated fashion by device 
20, and this can be accomplished by one of several techniques: removal or retention of 

1 5 an aliquot of the ligation mixture. 

Removal of an aliquot for PCR amplification may be done by use of a dispersible 
solid phase, such as magnetic streptavidin. In a microtiter plate embodiment a 
subassembly such as a spotting robot that uses a pin transfer device may be used to 
transfer a small aliquot from each site on the microtiter plates as reported in the above- 

20 cited Meir-Ewert et al. article. When using a chip, a small aliquot can be removed by 
using an analogous hedgehog comb device as reported in Rosenthal A, O Coutelle, M 
Craxton. Large-scale production of DNA sequencing templates by microtitre format 
PCR. Nucleic Acid Research 1993; 21 : 173-174, or by using a blotter to retain a small 
portion from each of the sample sites, followed by washing out of the remaining 

25 contents. PCR amplification is then performed using these retained aliquots as the 
templates. Other methods for retaining a small aliquot can be implemented such as a 
low intensity magnetic separation, or by using a chip with chambers shaped or 
positioned in relation to the flow path to retain a small aliquot by mechanical means 
when supernatant is removed (e.g. with a lip). 
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Alternatively, to prevent the accumulation of PCR product during successive 
sequencing cycles, the automated device may be operated to retain only a small amount 
of each PCR product for subsequent steps. This can be done by using a streptavidin 
coated manifold as reported in Lagerkvist A, J Stewart, M Lagerstrom-Fermer, U 

5 Landegren. Manifold sequencing: Efficient processing of large sets of sequencing 
reactions. Proc. Natl. Acad. Sci. USA 1994; 91 : 2245-2249 and inserting the manifold 
into the amplification mixture to bind a small proportion of the biotinylated PCR 
products. In this case, the manifold-bound DNA segments are then moved to and dipped 
into individual reagents in subsequent steps, rinsing the manifold with wash buffer 

10 between steps, so that while PCR amplification occurs in the chip, other steps are carried 
using DNA segments that are bound to the manifold. 

Removal or retention of an aliquot may also be effected by using a cleavable 
linkage, e.g. a chemically- or photo-cleavable linkage arm such as reported in Dawson 
BA, T Herman, J. Lough: Affinity isolation of transcriptionally active murine 

1 5 erythroleukemia cell DNA using a cleavable biotinylated nucleotide analog. Journal of 
Biological Chemistry 1989; 264: 12830-12837, and Olejnik J, E Krzymanska-Olejnik, 
KJ Rothschild: Photocleavable biotin phosphoramidite for S'-end-labeling, affinity 
purification and phosphorylation of synthetic oligonucleotides. Nucleic Acids Research 
1996; 24: 361-366. In this case the cleavable linkage is employed for a portion, e.g. a 

20 large fraction, of the linkages used to attach the ligated DNA to the solid support or 

matrix. Cleavage then releases only the cleavably-bound DNA, permitting removal of a 
controlled portion of the DNA products. The PCR process may also be controlled by 
rendering much of the DNA product inaccessible to primer anealing and extension, for 
example by binding the DNA to a non-dispersible solid matrix or by pelleting a 

25 dispersible matrix. This takes advantage of the observation that immobilization of a 
nucleic acid component during PCR amplification reduces the efficiency of DNA 
amplification during solid phase PCR. Kohsaka H, DA Carson. Solid Phase Polymerase 
chain reacf/ow. Journal of Clinical Laboratory Analysis 1994;8:452-455. 

Figure 8 illustrates the reagent supply section 30 of the device to also contain 

30 DNA polymerase and ddNTPs. These have not been mentioned in the above 

description, but are used in the sequencing methods of Embodiments 3 and 4 described 
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above with relation to Figures 3 and 4, using labeled ddNTPs. In the method of Figure 
3, the automated apparatus is operated so that following Fokl digestion, magnetic 
binding, and washing, the DNA templates are incubated with a DNA polymerase and the 
four nucleotide terminators, each with a unique label. Following magnetic binding and 

5 washing, the incorporated label at each address is identified using the charge coupled 
device or other detector and, as before, the readings are passed as ordered information to 
the microprocessor data handler to note the additional nucleotide or nucleotides read at 
each site. Then, an adaptor is ligated to each of the templates. This is followed by PCR 
amplification which regenerates the next set of template precursors for the next 

1 0 sequencing cycle. 

The above described automated process is highly efficient. By using unique 
adaptors and corresponding amplification primers during each sequencing cycle, about 
twenty sequencing cycles can be carried out, resulting in the sequencing of 1 80 
nucleotides, of which typically at least 160 nucleotides will lie outside the primer in the 

1 5 end being sequenced. Thus, providing these DNA segments do not contain an internal 
Fokl recognition domain, the above-described steps will generate 1 ,600,000 nucleotides 
of new sequence from a single 100 x 100 well chip. Since the Fokl recognition domain 
has a five bp recognition domain, it is predicted to occur approximately every 1000 bp 
(4 5 = 1 024) in random sequence. If the average size of each amplified fragment lying 

20 between the amplifying primers is 300 bp, then about 30% of the DNA segments to be 
sequenced will contain an internal Fokl site and will not be successfully sequenced using 
only this simple protocol. Thus, in DNA sequences with a random distribution of equal 
numbers of GGATG nucleotides, about 70% of the fragments will be successfully 
sequenced, resulting in the sequencing of approximately 1,120,000 nucleotides rather 

25 than 1,600,000. 

This processing obstacle imposed by pre-existing Fokl recognition domains may 
be addressed by hemi-methylating these recognition domains. The methods described in 
Figures 1 and 3 do not provide for the hemi-methylation of those Fokl recognition 
domains that lie outside the adaptor encoded domain. Prior studies such as Looney MC, 

30 LS Moran, WE Jack, GR Feehery, JS Benner, BE Slatko, GG Wilson. Nucleotide 
sequence of the Fok I restriction-modification system: Separate strand-specificity 
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domains in the metkyltransf erase. Gene 1989; 80: 193-208 have shown that 
hemimethylation of the Fokl recognition domain prevents cutting from being mediated 
by these domains. However, since each strand of the Fokl recognition domain contains 
the adenosine nucleotide, the PCR based method described by Padgett and Sorge in 

5 Padgett KA, JA Sorge. Creating seamless junctions independent of restriction sites in 
PCR cloning. Gene 1996; 168: 31-35 cannot be used to selectively hemi-methylate the 
adenosine nucleotides in such internal sites. Rather, when carrying out the invention 
with Fokl, hemi-methylation requires the use of the method of Han and Rutter described 
in Han J, Rutter WJ. Agt22S, a phage expression vector for the directional cloning of 

1 0 cDNA by the use of a single restriction enzyme SfiL Nucleic Acids Res 1988; 16:11 837 
as noted above. 

The method is thus augmented by the following step: Following PCR 
amplification, binding to streptavidin and magnetic pelleting, the non-biotinylated strand 
is removed by denaturation and magnetic pelleting, followed by washing to remove 

1 5 reagents and primers. Since Fokl cutting requires a double-stranded recognition domain, 
as reported by Podhajska AJ, W Szybalski. Conversion of the Fok I endonuclease to a 
universal restriction enzyme: Cleavage of phage M13mp7 DNA at predetermined sites. 
Gene 1985; 40: 175-182, this site is recreated, and the internal Fokl sites are hemi- 
methylated, by using a primer encoding the Fokl recognition domain. This primer is 

20 complementary to the lower stand of the ligated sequencing adaptor through the adenine 
moiety in the Fokl recognition domain, and polymerization occurs using four 
nucleotides except that N6-methyl-dATP substituted for dATP. This process thus 
regenerates the adaptor encoded Fold recognition domain and hemimethylates those 
recognition domains that lie internal to the sequencing adaptor encoded domain. The 

25 DNA segments, once hemi-methylated, are then sequenced by the automated steps 
described above. 

The invention contemplates a number of practical implementations of novel chip- 
based support arrays for carrying out the described steps in an automated manner. 

Chips that house 50,000 DNA segments can be generated by microfabrication of 
30 microchambers using photolithography following the approaches and teachings of 
Wilding P, MA Shoffher, LJ Kricka. PCR in a silicon microstructure. Clinical 
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Chemistry 1994; 40: 1815-1818; of Kikuchi Y, K Sato, H Ohki, T Kaneko. Optically 
accessible micrchannels formed in a single-crystal silicon substrate for studies of blood 
rheology. Microvascular Research 1992; 44: 226-240; of Woolley AT, RA Mathies. 
Ultra-high-speed DNA fragment separations using microfabricated capillary array 
5 electrophoresis chips. Proc. Natl. Acad. Sci. USA 1994; 91 : 1 1348-1 1352; of Baxter 
GT, LJ Bousse, TD Dawes, JM Libby, DN Modlin, JC Owicki, JW Parce. 

i 

Microfabrication in silicon microphysiometry. Clin. Chem. 1994; 40: 1800-1804; of 
Kricka LJ, X Ji, O Nozaki, P Wilding. Imaging of chemiluminescent reactions in 
mesoscale silicon-glass microstructures. J. Biolumin. 1994; 9: 135-138; or may be 

10 fabricated using molded or etched polymers as described by Matson RS, J Rampal, SL 
Jr. Pentoney, PD Anderson, P Coassin. Biopolymer synthesis on polypropylene supports: 
Oligonucleotide arrays \ Analytical Biochemistry 1995; 224: 110-116. Alternatively, 
chip addresses may be separated by hydrophobic borders which may, for example, be 
implemented with conventional sample cell construction techniques or formed by 

1 5 processes of lithography and chemical treatment. Movement of the reagents to and from 
this chip can be done using pumps as reported in Burns MA, CH Mastrangelo, TS 
Sammarco, FP Man, JR Webster, BN Johnson, B Foerster, D Jones, Y Fields, AR 
Kaiser, DT Burke. Microfabricated structures for integrated DNA analysis. Proc. Natl. 
Acad. Sci. USA 1996; 93: 5556-5561 and in Wilding P, J Pfahler, HH Bau, JN Zemel, 

20 LJ Kricka. Manipulation and flow of biological fluids in straight channels 

micromachined in silicon. Clinical Chemistry 1994; 40: 43-47. Alternatively, fluids 
may be brought to the sites by centrifugal force. 

In this case the overall requirements for conduits, valves and wash-out passages 
may be substantially reduced, as it is only necessary to supply each reagent or solution to 

25 a central position communicating with the array. The array itself may mount in a 

shallow tray or cover assembly which effectively channels the flow to the array sites. In 
general, the sequencing method of the invention does not require the transfer of small 
amounts of liquids through capillaries, and therefore avoids many of the technological 
obstacles resulting from shearing forces encountered in low diameter capillary flow, as 

30 reported in Wilding P, J Pfahler, HH Bau, JN Zemel, LJ Kricka. Manipulation and flow 
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of biological fluids in straight channels micromachined in silicon. Clinical Chemistry 
1994; 40: 43-47. 

Figure 9 shows an embodiment of a system 1 10 in which movement of reagents 
onto chips is effected by centrifugal force. In this device, the chips 10' are on a 

5 turntable. Reagents are placed closer to the center of the turntable, and rotating the 
turntable drives the reagents radially outward directly to one or more chips. Centrifugal 
force also allows reagents to be removed from chips. A chip or chip holder itself is 
preferably configured for flow-through operation to simplify and enhance the removal of 
reagents (see, e.g., Beattie KL, WG Beattie, L Meng, SL Turner, R Coral-Vazquez, DD 

10 Smith, PM Mclntyre, DD Dao. Advances in genosensor research. Clinical Chemistry 

1995;41:700-706). 

In the device 1 10, illustratively set up for the processes described herein, nine 
support arrays 1 la, 1 lb,...l li are located around a rotating stage with each 
communicating at a radially innermost corner with a corresponding flow supply conduit 

15 12a, 12b,...12i. Outlets (not shown) may be to a common drain. Thus each support 
array in this device embodiment may receive a separate set of reagents. For example, 
the nine arrays may be initially loaded with identical DNA samples in each respective 
well, and then all samples in an array processed to produce templates offset by a fixed x, 
with x={ 1 ,2...9} different for each array. Once the nine sets of templates on the 

20 corresponding supports have been created, running the sequencing process steps of the 
present method then produces a continuous nucleotide sequence for each of the initial 
segments. 

When performing the amplification steps, during incubations, the magnetic 
streptavidin bound DNA can be suspended by shaking or by magnetic oscillation as 

25 described in the Product information on MixSep 0 . Sigris Research, Inc. Brea, CA. To 
retain a small portion of the magnetic particles prior to the addition of PCR reagents and 
PCR amplification, the magnetic pelleting can be adjusted electrically. In the chip 
embodiment, PCR thermal cycling is very efficient, since heat transfer occurs rapidly 
over short distances. The thermal cycler can be a Peltier heater-cooler device built into 

30 the stage, a set of fixed temperature plates or baths which are successively placed in 

thermal contact with the chips, or an air oven (see, for example, Meier-Ewert S, E Maier, 



WO 99/45153 PCT/US99/04883 

-59- 

■ 

A Ahmadi, J Curtis, H Lehrach. An automated approach to generating expressed 
sequence catalogues. Nature 1993; 361: 375-376; Drmanac S, R Drmanac. Processing 
of cDNA and genomic kilobase-size clones for massive screening, mapping, and 
sequencing by hybridization. BioTechniques 1994; 17: 328-336; Wilding P, MA 

5 Shoffner, LJ Kricka. PCR in a silicon microstructure. Clinical Chemistry 1994; 40: 
1815-1818; and Shouffher MA, J Cheng, GE Hvichia, LJ Kricka, P Wilding. Chip PCR. 
I. Surface passivation of microfabricated silicon-glass chips for PCR. Nucleic Acid 
Research 1996; 24: 375-379. Reading the identity of incorporated label can be carried 
out using a charge coupled device, as described above, or using a fluorescent 

1 0 microscope, fiber-optic detectors, biosensors, gas phase ionization detector, or a 
phosphorimager as described in Kinjo M, R Rigler. Ultrasensitive hybridization 
analysis using fluorescence correlation spectroscopy. Nucleic Acid Research 1995; 23: 
1 795-1 799; Mauro JM, LK Cao, LM Kondracki, SE Walz, JR Campbell. Fiber-optic 
fluorometric sensing of polymerase chain reaction-amplified DNA using an immobilized 

15 DNA capture protein. Analytical Biochemistry 1996; 235: 61-72; Nilsson P, B Persson, 
M Uhlen, P Nygren. Real-time monitoring of DNA manipulations using biosensor 
technology. Analytical Biochemistry 1995; 224: 400-408; Eggers M, D Ehrlich. A 
review of microfabricated devices for gene-based diagnostics. Hematologic Pathology 
1995; 9: 1-15. 

20 Even without special biochip microfabrication, the methods of the present 

invention are advantageously implemented in a device that operates in a microtiter plate 
format. In this case the construction of the subassemblies for the scintillation or 
fluorescence counting of multi-well microtiter plates and for the automated picking of 
colonies into the wells, as well as the necessary reagent introduction and thermal cycling 

25 to amplify DNA simultaneously in multiple multi-well microtiter plates, allows the 

simultaneous amplification, treatment and reading of the array of samples. Indeed, with 
prior art subassemblies handling 120 plates, each with 384 wells, 46,080 samples may be 
processed simultaneously. Therefore, the sequencing protocol estimated to sequence 
160 nucleotides in a clone insert would sequence simultaneously 204,800 nucleotides 

30 from 1280 clones using a single 120 plate thermal cycler, 384 well scintillation counter, 
one radiolabel, a 384 pin transfer device (e.g., a hedgehog comb) and a robotic pipetter. 
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[46,080 wells/9 initial templates = 5120 samples; 5120/4 ligations = 1280 samples 
(clones). 1280 clones x 160 nucleotides/clone = 204,800 nucleotides]. (Meier-Ewert S, 
E Maier, A Ahmadi, J Curtis, H Lehrach. Nature 1993; 3631:375-376.) 

With the foregoing overview of the organization of a method and apparatus for 
large scale or multiplex processing of collections of segments, a detailed description will 
now be given of several embodiments of the sequencing method as applied to a single 
segment. 

This invention is further illustrated by the following Exemplification which 
should not be construed as limiting. The contents of all references and published patents 
and patent applications cited throughout the application are hereby incorporated by 
reference. 



Experimental Strategy; 

The present invention allows one to sequence numerous DNA segments in 
parallel without running a gel. It is an iterative method that allows one to sequence 
DNA in fixed intervals of greater than one nucleotide, and provides a means for 
regenerating the desired DNA segment following each iterative cycle. This is 
accomplished by the iterative application of a DNA ligase and an enzyme, e.g., a class- 
IIS restriction endonuclease, to generate templates for DNA sequencing. One simple 
schematic is outlined below. 



overhang ^ ' 

In each cycle, adaptor ligation to one end of the DNA segment is followed by 
class-IIS restriction endonuclease cutting. The recognition domain of the class-IIS 
restriction endonuclease is encoded by the ligated adaptor, allowing restriction 




Ligation to an adaptor 
containing a US restriction 
endonuclease recognition 
domain 



IIS restriction 
endonuclease 
digestion 



DNA with 



« 



WO 99/45153 PCT/US99/04883 

-61- 

endonuclease digestion to trim the DNA segment, generating a new overhang sequence. 
One or both strands of an adaptor can be ligated, or one or both ends of a single-strand 
hairpin adaptor can be ligated. Also, one strand of an adaptor can be ligated followed by 
hybridization, without ligation of the complementary strand, to generate a double- 

5 stranded recognition domain. Iterative cycles generate a series of single-strand 
overhangs, each constituting a DNA template. The single-stranded overhangs are 
separated by fixed intervals that are limited by the distance between the recognition 
domain and the cut site in the cleavage domain for the class-IIS restriction endonuclease 
encoded by the ligated adaptor. This method exploits the separation of the cleavage 

10 domain and the recognition domain of class-IIS restriction endonucleases by allowing 
the sequencing in strides limited only by the distance between the recognition domain 
and the cleavage domain cut sites, distinguishing it from other iterative approaches. 
Since each DNA template is a short single-stranded region attached to double-stranded 
DNA, these single-strands have little opportunity to form secondary structures, 

1 5 providing a considerable advantage over competing methods. 

The overhang generated after each cycle constitutes a DNA template that is 
sequenced in one of a variety of ways. One way uses template-directed DNA ligation to 
discriminate between nucleotides at the ligation junction, allowing this ligation to 
generate sequence information. This is illustrated below: 



20 



IIS restriction 



Template-directed ligation to 
one of four adaptors sharing 
IIS restriction endonuclease 
recognition domain, each Identification of 
adaptor distinguished by a ta ligated adaptor 
identifying the nucleotide at by detection of endonuclease 
the ligation junction the tag v digestion 

DNA with — > > 

overhang 4r 



Successful ligation requires that an adaptor's single-stranded end be 
complementary to the double-stranded DNA's single-stranded overhang sequence at the 
ligation junction. Four adaptors (or adaptor subsets) are used during each ligation, with 
each of the four adaptors differing at the nucleotide positioned to undergo ligation at the 
25 template-directed ligation junction. Ligation to one of the four adaptors and 
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identification of that adaptor allows identification of the nucleotide at the ligation 
junction, thus generating sequence information. Sequencing can be accomplished by 
fluorometry using adaptors tagged with distinct fluorescent labels. This is followed by 
class-IIS restriction endonuclease mediated end trimming of the DNA using the 

5 recognition domain encoded by the iigated adaptor. This recognition domain is 

positioned so that cleavage results in the removal of nucleotides from each strand of the 
DNA, creating a new template for subsequent template-directed ligation to one of four 
adaptors or adaptor subsets. This strategy can use an en2yme, e.g., a class II-S 
restriction endonuclease, that generates either a 5' or a 3' overhang sequence, as either 

10 type of overhang can serve as a template for template-directed ligation. 

Another approach uses template-directed polymerization instead of template- 
directed ligation to sequence DNA. In this case, adaptor ligation can be template- 
directed but is not used to discriminate between nucleotides at the ligation junction. 
Sequencing occurs through a separate template-directed DNA polymerization step. In 

1 5 order to use template-directed polymerization to sequence the overhang sequence, the 
overhang must be a 5 f overhang, since template-directed polymerization requires a 
recessed 3 1 end. A simple schematic of this approach is outlined below. 



Template-directed 
Ligation to an polymerization with 

adaptor containing the four nucleotide tn t 

a IIS restriction us restriction extension terminators, Identification 
endonuclease endonuclease each with a distinct <» incorporated 

recognition domain digestion label 



DNA > ^ t 



Ligation can be template-directed, occurring using an adaptor with a 
20 promiscuous nucleotide or nucleotides (degenerate or universal) at the ligation junction, 
so that this ligation is not used to discriminate between nucleotides at the ligation 
junction, and therefore does not generate sequence information. Ligation of the adaptor 
is followed by class-IIS restriction endonuclease trimming, generating a 5* overhang 
sequence. The 5' overhang has a recessed 3* end, forming a substrate for template- 
25 directed DNA polymerization. Template-directed polymerization occurs in the presence 
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of each of the four labeled nucleotide terminators (e.g. ddNTPs). These nucleotide 
terminators can each have distinct fluorescent tags, so that following incorporation of 
one of these labeled nucleotide terminators, a fluorometer can identify the incorporated 
nucleotide (Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson C W, Zagursky RJ, 
Cocuzza AJ, Jensen MA, Baumeister K., Science 1987; 238:336-341). Iterative cycles 
of adaptor ligation and IIS cutting create new templates for sequencing by template- 
directed polymerization. 

One obstacle inherent in iterative methods that generate a product is that even if 
the constituent enzymatic steps approach 100% completion, incompletely processed 
products can accumulate to significant levels. For example, during oligonucleotide 
synthesis of a 70-mer, requiring 69 couplings, a 99% coupling efficiency results in only 
50% of the generated oligonucleotides being full length (0.99 69 = 0.50). The present 
invention eliminates this problem by allowing one to sequence in intervals of greater 
than one nucleotide. For example, the Fokl recognition domain is separated from its 
cleavage domain by nine nucleotides. Using a Fokl based protocol, single-strand 
overhangs can be generated in each cycle that are separated by nine nucleotide long 
intervals over time and space, so that five cycles will allow one to span 45 nucleotides, 
instead of just five nucleotides using an iterative method that sequences consecutive 
nucleotides (e.g. the base addition DNA sequencing scheme). This is termed striding, as 
it covers a considerable stretch of DNA with few iterative steps. Sequencing single 
nucleotides in intervals of greater than one nucleotide requires the sequencing of the 
nucleotides that fall within each interval. One sequencing method generates DNA 
templates separated by intervals of nine nucleotides, and sequences a single nucleotide in 
each template, by making nine initial templates for each DNA segment being sequenced, 
such that sequencing these nine initial templates will sequence nine adjacent nucleotides. 
The nine initial templates can be generated by ligating one end of each DNA segment to 
be sequenced to nine distinct adaptors in nine separate ligations, each adaptor containing 
a Fokl recognition domain, with these domains offset from each other by one base pair 
when comparing adjacently positioned recognition domains. In one embodiment, the 
DNA segment to be sequenced is generated by PCR amplification, and offset recognition 
domains are incorporated during PCR amplification by encoding the recognition domain 
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into one of the amplifying primers according to the method of Mullis K, Faloona F, 
Scharf S, Saiki R, Horn G, Erlich H., Cold Spring Harbor Symposia on Quantitative 
Biology, Cold Spring Harbor Laboratory, LI:263-273. When the DNA samples to be 
sequenced are vector inserts, as in a genomic or cDNA library, a set of initial template 
precursors can be generated for each DNA insert to be sequenced using a single set of 
initial adaptors. For example, following digestion with a restriction endonuclease that 
cuts the vector adjacent to each insert, offset recognition domains can be appended to 
each of the numerous vector inserts through ligation to each of the initial adaptors. This 
can be followed by PGR, to seal nicks and retrieve the product. An alternative approach 
is to use PCR alone to generate offset recognition domains. For example, when 
sequencing DNA libraries, primers can be designed to anneal to a vector sequence 
immediately flanking each insert. Once this set of DNA segments with offset (i.e., 
staggered) recognition domains is generated for each DNA segment to be sequenced, 
these DNA segments can be sequenced concurrently, so that the number of steps 
necessary to sequence a contiguous stretch of DNA in the original DNA segment is 
markedly reduced. Using any of the above approaches, only a few primers must be 
made to sequence numerous vector inserts. Furthermore, each of the nine products can 
have a uniquely positioned recognition domain, so that digestion with Fokl cleaves both 
strands of each DNA segment and generates a set of nine overhang sequences positioned 
as a staggered array separated by one base pair. Generating several initial DNA 
templates for each DNA segment to be sequenced diminishes the number of successive 
steps necessary to sequence a given stretch of DNA, and therefore significantly 
diminishes the accumulation of background signal when sequencing over a given span of 
DNA. 

In order to regenerate the product of interest following each cycle of restriction 
endonuclease digestion and adaptor ligation, an additional step is designed. Specifically, 
this invention uses adaptor ligation during each sequencing cycle. These ligated 
adaptors can differ during each cycle (or every several cycles), allowing the product 
generated following each cycle of restriction endonuclease digestion and template- 
directed ligation to have a unique end created by the ligated adaptor. This unique end 
can generate a primer annealing site during PCR, such that PCR can amplify the desired 
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product over a million fold following each adaptor ligation step (Saiki RK, DH Gelfand, 
S Stoffel, SJ Scharf, R Higuchi, GT Horn, KB Mullis, HA Eriich, Science 1988; 
239:487-491). Nucleic acid amplification in vitro can be exponential, as is usually done, 
or linear, in which one primer undergoes one or more cycles of primer extension, 
followed by its removal and cycles of single primer extension using the opposite primer. 
This in vitro amplification step replenishes the desired product (some product is 
inevitably lost in prior steps), and prevents uncut products or unligated products from 
generating background signal. It also regenerates the template precursor by eliminating 
base mismatches, nicks, and displaced ends lying between the recognition domain and 
the cleavage domain following adaptor ligation. Thus, cutting efficiencies need not 
approach 100%; this method allows one to use lower concentrations of restriction 
endonuclease that preferably cut with very high specificity (> 99.9%) for the canonical 
recognition domain (Fuchs R, R Blakesley, Methods in Enzymology 1983; 100:3-38). 
Furthermore, this method works well even when DNA ligation is inefficient, as when 
ligating fragments with a single nucleotide overhang, because the desired template 
precursor can be readily amplified over one million fold using PGR amplification. Also, 
following fill-in with labeled ddNTPs, even if the label interferes with ligation, only a 
fraction of those filled in would need to be labeled, as product regeneration through 
amplification in vitro does not require a large proportion of the filled-in product to 
undergo efficient ligation. The remaining product could either not undergo fill-in (in the 
presence of low numbers of labelled ddNTPs) or undergo fill-in in the presence of 
unlabelled ddNPTs (along with labelled ddNPTs). When using nucleic acid 
amplification in vitro to re-generate each template-precursor, the adaptor does not need 
to have a double-stranded recognition domain, as the recognition can be encoded by an 
adaptor containing only a single-strand of the recognition domain, with the double- 
stranded recognition domain generated during the nucleic acid amplification in vitro. 

In one embodiment, recognition domains for the class-IIS restriction 
endonuclease used to generate the DNA templates that occur in the original DNA 
segment (internal to the ligated adaptor), are methylated or otherwise blocked to prevent 
cutting mediated by these internal domains. Blocking of internal recognition domains 
can be accomplished by treatment with the corresponding methylase (Fok I methylase 
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for Fok I restriction endonuclease (Kita K, H Kotani, H Sugisaki, M Takanami, J Biol 
Chem 1989;264:5751-5756, Looney MC, LS Moran, WE Jack, GR Feehery, JS Benner, 
BE Slatko, GG Wilson, Gene 1989; 80:193-208), prior to adaptor ligation. This 
prevents cutting mediated by these internal recognition domains, without preventing 

5 cleavage directed by the ligated adaptor (whose recognition domain is not methylated). 

Hemi-methylation of these internal recognition doipains can be carried out using 
the strategy of Han and Rutter or using the PCR-based strategy of Padgett and Sorge, as 
described in more detail herein (Han J. Rutter WJ. Nucleic Acids Res 1988;16:1 1837, 
Padgett KA, JA Sorge, Gene 1996; 168:31-35). Each strategy hemi-methylates, and 

10 effectively blocks, internal recognition domains without methylating the primer-encoded 
recognition domain. The method of Padgett and Sorge cannot be used if each strand of 
the chosen recognition domain contains all four nucleotides, because PCR amplification 
cannot be carried out with selective methylation of those recognition domains that lie 
outside of the primer encoded recognition domain, as the strand antisense to the primer's 

1 5 recognition domain will be methylated during PCR. The method described by Han and 
Rutter can hemi-methylate the internal recognition domains regardless of the nucleotide 
composition of each strand of the recognition domain, and it can be incorporated into a 
linear amplification step. 

The PCR-based method of Padgett and Sorge has the advantage of allowing the 

20 simultaneous exponential amplification of the product of interest along with hemi- 
methylation of the internal recognition domains. This is accomplished by amplification 
with a methylated nucleotide that does not lie within the sequence antisense to the 
recognition domain sequence in the amplifying primer, and can be carried out using 
ligated adaptors and amplifying primers that vary during each cycle (or every several 

25 cycles) as described. In this case, however, the 3 1 end of each amplifying primer must 
encode at least a portion of the restriction endonuclease recognition domain of the class- 
IIS restriction endonuclease used to trim the DNA segment. This may diminish the 
specificity of the PCR amplification for the product of interest, as these shared 3' ends 
may result in some amplification of uncut DNA products. The strategy of Han and 

30 Rutter can be modified to linearly amplify the product of interest, while simultaneously 
hemi-methylating the internal recognition domains. This can be carried out by iterative 
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primer extensions using the primer encoding at least a portion of the recognition domain, 
with a methylated nucleotide substituting for the corresponding non-methylated 
nucleotide, before or after reiterative primer extensions with the opposite primer using 
the four normal dNTPs. Also, one could use a primer encoding the recognition domain 

5 for Fokl and undergo PCR amplification with 6-methyl dATP substituted for dATP. 
This would double methylate each recognition domain for Fokl, that is methylate each 
strand of the double stranded recognition domain, except for the primer encoded strand, 
which would be hemi-methylated, so that during digestion with a mutant Fokl restriction 
endonuclease isolated by Waugh and Sauer (Waugh, D.S., and Sauer, R.T., J. Biol 

10 Chem. y 269:12298-12303 (1994)), that can cut via hemi-methylated Fokl recognition 
domains, but will not cut via double-methylated Fokl recognition domains, only the 
primer directed recognition domain would be recognized and mediate cleavage. The 
primer directed domain need not contain the entire recognition domain, but only the 
GGA portion of the upper strand GGATG Fokl recognition domain sequence, since this 

1 5 will prevent methylation of adenine in the primer's upper strand recognition domain 
during PCR. The genetic screen strategy outlined by Waugh and Sauer could also be 
used to isolate such mutants for other class-IIS restriction endonucleases. Any of the 
above strategies for methylating internal recognition domains can be carried following in 
vitro amplification of the product of interest, and such prior in vitro amplification could 

20 occur through PCR or a related method, such as strand displacement amplification 
(Walker GT, MS Fraiser, JL Schram, MC Little, JG Nadeau, DP Malinowski Nucleic 
Acids Research 1992; 20:1691-1696). Such prior DNA amplification in vitro need not 
have a portion of the recognition domain incorporated into any of the amplifying 
primers, allowing exquisite specificity during product regeneration. 

25 

EXAMPLE 1 

Demonstration of Interval Sequencing Mediated by Class-IIS Restriction 
Endonuclease Generated 5* Overhangs and Template-Directed Ligation 

Using a Fokl based strategy, single nucleotides separated by intervals of nine 
30 nucleotides were sequenced using simple reagents and a scintillation counter. The initial 
template precursor was a 93 bp PCR product containing a portion of the Cystic Fibrosis 
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Transmembrane Conductance Regulator gene that had been amplified directly from 
human genomic DNA. Sequencing was accomplished by template-directed ligation 
using six sequencing cycles. Following sequencing of the first nucleotide, five 
additional nucleotides were sequenced at nine nucleotide intervals, so that the 

5 sequencing covered a span of 46 nucleotides (1 + (5 x 9) » 46). The non-biotinylated 
primer used to generate the template precursor contained a recognition domain for Fokl. 
The opposite primer had a biotinylated 5* end, and was used to bind the template 
precursor to magnetic streptavidin beads. Use of magnetic streptavidin beads allowed 
enzymatic reactions to occur in solution, and facilitated removal of a small aliquot for 

10 each PCR amplification step during the sequencing cycles. During the sequencing 
cycles, only two sets of adaptors were used, and each unique PCR amplifying primer 
used during the sequencing cycles was identical to the upper strand of the previously 
used adaptor, so that these unique amplifying primers contained the Fokl recognition 
domain in their 3' ends, minimizing the number of oligonucleotides synthesized. In this 

1 5 protocol, identification of a nucleotide during each sequencing cycle took place using 
four ligation reactions (for the single template precursor). In each ligation, all four 
adaptors were present, with the 3' end of a different one of the four adaptors in each 
ligation tagged with 35 S. Quantitation of retained 35 S radiolabel was carried out using a 
scintillation counter, and a dominant signal for the correct nucleotide was clearly 

20 detected during each cycle. The details are outlined below: 

Sequencing Adaptor Generation: 

Adaptor set #1 (lower strands of this adaptor set are shown in the box below) was 

generated as follows: 6.3 \x\ of the lower strand of the first three of the four adaptors 
25 (100 pmole/nl) were added, in three separate reactions (one for each oligonucleotide) to 
4.4 \i\ H20, 3.3 \i\ 5x Terminal deoxynucleotidyl transferase buffer (500 mM cacodylate 
buffer, pH 6.8, 5 mM CoCl 2 , 0.5 mM DTT); 1.3 nl Terminal deoxynucleotidyl 
transferase (20U/^1; Promega, Madison WI) and 1.0 \xl p 5 S]ddATP (12.5 nCi/^1). The 
final oligonucleotide was processed as described above, except that half amounts were 
30 used. All of the samples were incubated at 37°C for one hour followed by heat 
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inactivation at 70°C for 10 minutes, resulting in a final volume of 16.3 pi for the first 
three labeled oligonucleotides, and a final volume of 8.2 yl for the final labeled 
oligonucleotide (with the 5' G). 



5'P-CNNNCATCCGACCCAGGCGTGCG (SEQ ID NO:l) or 
5T-ANNNCATCCGACCCAGGCGTGCG (SEQ ID NO:2) 
or 5T-TNNNCATCCGACCCAGGCGTGCG (SEQ ID NO:3) or 
5T-GNNNCATCCGACCCAGGCGTGCG (SEQ ID NO:4); only the 5' end varies 
between these four oligonucleotides, and this nucleotide is underlined; the Fokl 
recognition sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



10 The 16.3 p,l of each of the first three labeled oligonucleotides were separately 

added to 2.5 pi 10X T 4 DNA Ligase buffer (660 mM Tris-HCl, 50 mM MgCl 2 , 10 mM 
dithioerythritol, 10 mM ATP, pH 7.5) and to 6.2 \x\ of the upper strand of the sequencing 
adaptor (100 pmole/pl): 



15 



5-CGCACGCCTGGGTCGGATG (SEQ ID NO:5); the Fokl recognition sequence is in 
bold type. 



The last labeled oligonucleotide (with the 5* G) was processed as described above, 

except in half amounts, resulting in a final volume of 25 pi for each of the first three 

adaptors and 12.5 pi for the final adaptor. 

Non-radiolabeled counterparts to the above four adaptors were generated by 

20 adding 20.0 pi (100 pmole/pl) of each of the first three lower strands, separately to 20.0 
pi (100 pmole/pl) of the upper strand, 8.0 pi of 10X T 4 DNA Ligase buffer and 32 pi 
H2O, for a final volume of 80 pi, and 10.0 pi (100 pmole/pl) of the final lower strand 
(with the 5' G) was added to half amounts of the above constituents, for a final volume 
of 40 pi. Each of the eight sets of adaptors (four radiolabeled and four non-radiolabeled) 

25 were incubated at 93°C for 30 seconds followed by annealing at 25°C for 5 minutes. 
The radiolabeled final adaptor (with the 5' G) was added to 12.5 pi H2O, to bring the 
final volume to 25 pi, like the other radiolabeled adaptors, and the 40 pi of the non- 
radiolabeled final adaptor was added to 40 pi H2O, to bring the final volume to 80 pi, 
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like the other non-radiolabeled adaptors. Each adaptor with a 5' G was at half the 
concentration of the other adaptors based on ligation data from preliminary experiments. 

Each radiolabeled adaptor was added to 25 |il of the non-radiolabeled adaptors 
with the other three 5' ends. This resulted in four adaptor #1 mixes, each with one 
radiolabeled adaptor and the remaining three non-radiolabeled adaptors. Using four 
ligation mixtures allows one to sequence nucleotides using a single label and a simple 
detection apparatus (e.g. a scintillation counter). 

Adaptor set #2 was made the same way as adaptor set #1, except that the four 
oligonucleotides for the lower strands of the adaptors were: 



10 



15 



5T-CNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:6) or 
5T-ANNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:7) or 
5T-TNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:8) or 
5T-GNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:9); only the 5' end varies 
between each of these four oligonucleotides, and this nucleotide is underlined; the Fokl 
recognition sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



and the oligonucleotide for the upper strand of the adaptors was: 



5 ! -CCCGTGCAGCCCAGAGGATG (SEQ ID NO: 10); the Fokl recognition sequence 
is in bold type. 



20 Initial Sequencing Template Generation: 

PCR amplification of a 93 bp initial template precursor from human genomic 
DNA was carried out using primers A and B (shown in the box below) as follows: 200 
ng human genomic DNA (Promega, Madison WI) in 2.0 jxl was placed with 41.6 p.1 
H 2 0, 6.0 Ml lOx buffer (100 mM Tris-HCl pH 8.3, 1 .0 M KC1, 0,5% Tween 20, 50% 

25 Glycerol), 4.0 jxl containing 5.0 mM each dNTP (100 mM stock (Boehringer Mannheim, 
Indianapolis IN) diluted in H2O), 1 .0 \x\ Primer A (25 pmole/nl ), 1 .0 \il Primer B (25 
pmole/|il), 4.4 jil 25 mM Mg(OAc)2, in each of four microcentrifuge tubes. A wax bead 
was added (Perkin Elmer, Foster City CA) and the tubes were heated to 80°C for 3 
minutes and then cooled to 25°C. An upper layer of reagents consisting of 35.0 \xl H2O, 

30 4.0 ^1 lOx buffer and 1 .0 ^1 rTth DNA Polymerase (2.5 U/jil; Perkin Elmer) was placed 
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on top of each wax bead, and the four tubes underwent an initial denaturation step at 
94°C for 1 minute followed by 30 thermal cycles using the following parameters (94°C 
for 30 seconds, 50°C for 30 seconds), a final extension at 72°C for 7 minutes, and a 4°C 
soak. 



Primer A: GTTTTCCTGGATGATGCCCTGGC (SEQ ID NO:l 1); mismatch to 
genomic DNA underlined; Fok\ recognition sequence in bold type. Primer B: 5' Biotin- 
CATGCTTTGATGACGCTTCTGTATC (SEQ ID NO: 12); the biotinylated 5' end was 
generated during oligonucleotide synthesis using a biotin phosphorarnidite (Glenn 
Research, Sterling VA). 



10 The samples were combined, and 360 \il of this product was incubated with 4.0 

Hi Exonuclease I (20U/^il; Epicentre, Madison WI) at 37°C for 30 minutes, followed by 
heat inactivation at 80°C for 1 5 minutes. The sample was purified by glass bead 
extraction using Mermaid (BIO101, La Jolla CA) and was suspended in 90 \il TE (10.0 
mM Tris-HCl pH 8.0, 1 .0 mM EDTA). Eighty \il of this product was digested with 5.0 

1 5 |*1 Fokl (3U/jd; Boehringer Mannheim) in the manufacturer's 1 x buffer in a total volume 
of 100 nl at 37°C for 1 hour followed by heat inactivation at 65°C for 15 minutes. 87.5 
\x\ of this product was mixed with 90 \i\ of washed magnetic streptavidin beads in 2x 
binding-wash buffer (prepared from 150 \i\ Dynabeads M-280 Streptavidin, Dynal, Oslo 
Norway, as directed by the manufacturer), incubated for 1 hour at room temperature 

20 (23°C) with mixing to disperse the magnetic beads, magnetically pelleted (Dynal 
Magnetic Pellet Concentrator-E), washed three times in binding-wash buffer, and 

resuspended in 50 \x\ TE. 



Adaptor Ligation: 

25 The template underwent ligation separately to each of the four adaptor mixes in 

adaptor set #1 as follows: 12.5 \x\ of the template was added to 10 yl of each adaptor 
mix, 17.5 nl H 2 0, 5.0 pi 10X T 4 DNA Ligase buffer, and 5.0 f-il T 4 DNA Ligase (1.0 
U/jil; Boehringer Mannheim, Indianapolis IN) and incubated at 23°C for 1 hour with 
mixing every 15 minutes. Then, the mixture was magnetically pelleted, the supernatant 
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removed, and the pellets were washed three times in binding-wash buffer and then were 
resuspended in 50 \i\ TE. 

Scintillation Counting: 

5 Forty jil each of the four ligated samples were added to 2.5 ml of scintillation 

fluid (Beckman Ready Gel, Beckman Instruments, Fullerton CA) in a scintillation vial 
and underwent scintillation counting using a Beckman LS 1801 scintillation counter. 

PCR Amplification: 

10 One |il from each ligation (from the 10 fil remaining that did not undergo 

scintillation counting) underwent PCR amplification as was done in generating the initial 
template precursor, except that 42.6 \il H2O was used (instead of 41.6 jil) and the upper 
strand of sequencing adaptor set #1 was used as the PCR primer in place of Primer A. 

1 5 Second Sequencing Cycle: 

The steps were identical to the first sequencing cycle, except that the adaptor set 
used for adaptor ligation was adaptor set #2, and the upper strand of sequencing adaptor 
set #2 was used as a PCR primer instead of the upper strand of sequencing adaptor set 
#1. 

20 

Third Sequencing Cycle: 

The steps were identical to the second sequencing cycle, except that the adaptor 
set used for adaptor ligation was adaptor set #1 , and the upper strand of sequencing 
adaptor set #1 was used as a PCR primer instead of the upper strand of sequencing 
25 adaptor set #2. 

Subsequent Sequencing Cycles: 

Following the third sequencing cycle, the second sequencing cycle was repeated, 
and following this second sequencing cycle, the third sequencing cycle was repeated, 
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and following this third sequencing cycle, the second sequencing cycle was repeated 
through the scintillation counting step. 



Sequencing Results: 

5 The Fokl recognition domain is positioned in each ligated adaptor so that one 

nucleotide was sequenced at 9 nucleotide intervals. The initial template precursor is 
shown below, along with its Fokl recognition domain (bold type). Underlined sequences 
are the original amplifying primers (Primer A and Primer B). The cut sites for this 
recognition domain, as well as subsequent cut sites directed by ligated adaptors, are 

10 shown by dissecting lines. Cleavage generates a single-strand overhang that constitutes a 
template, and the nucleotide sequenced at each interval is shown by a numbered asterisk, 
the number identifying the sequencing cycle for sequencing the nucleotide. 

|i «2 »3 «4 «5 *f 

5 ' - tflTUC CT CCATQATCCCTCCC IaCCAT TAAA IgAAAA TATC UtCTTTGCT bnTC CTAT loATttfcATAT |a^ACAGAAGO3TCATCAAA0CATG-3 • 

3* CTtAA/U^A CTArTAf^ACI^ -Blot in S' 

The scintillation counts for each of the four adaptors at each sequencing interval 
1 5 (identified by sequencing cycle) is shown below. The highest counts are in bold type. 
Counts for the correct nucleotide were four fold greater than background (counts for any 
other nucleotide) in the first five cycles and greater than twice background in the final 
cycle (cycle 6). 



Sequencing Cycle Numbe 





1 


2 


3 


4 


5 


6 


Template nucleotid 
at ligation junction 


* 
* 

A 


A 


T 


T 


G 


T 


Predicted 5' end of 
adaptor undergoing 
ligation 


I 


I 


A 


A 


C 


A 


Scintillation G 


662 


1,504 


1,625 


6,793 


1,441 


1.779 


counts for . 
adaptors 


2,568 


1,618 


68,007 


34,753 


3,335 


14,397 


(identified by j 


32,917 


32,563 


5,797 


3,934 


14.787 


2,962 


3% labelled 3' 
end) C 


1,703 


988 


1,704 


1.745 


67,233 


5,304 
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EXAMPLE IB 

Demonstration of Interval Sequencing Mediated by Class-IIS Restriction 
Endonuclease Generated 5' Overhangs and Template-Directed Ligation 

Using a Fokl based strategy, single nucleotides separated by intervals of nine 
nucleotides were sequenced using simple reagents and a scintillation counter. The initial 
template precursor was a 93 bp PCR product containing a portion of the Cystic Fibrosis 
Transmembrane Conductance Regulator gene that had been amplified directly from 
human genomic DNA. Sequencing was accomplished by template-directed ligation 
using three sequencing cycles. Following sequencing of the first nucleotide, two 
additional nucleotides were sequenced at nine nucleotide intervals, so that the 
sequencing covered a span of 19 nucleotides (1 + (2 x 9) = 19). The non-biotinylated 
primer used to generate the template precursor contained a recognition domain for FokL 
The opposite primer had a biotinylated 5' end, and was used to bind the template 
precursor to magnetic streptavidin beads. Use of magnetic streptavidin beads allowed 
enzymatic reactions to occur in solution, and facilitated removal of a small aliquot for 
each PCR amplification step during the sequencing cycles. During the sequencing 
cycles, only two sets of adaptors were used, and each unique PCR amplifying primer 
used during the sequencing cycles was identical to the upper strand of the previously 
used adaptor. In this test protocol, identification of a nucleotide during each sequencing 
cycle took place using four ligation reactions (for the single template precursor). In each 
ligation, all four adaptors were present, with the 3' end of a different one of the four 
adaptors in each ligation tagged with 32 P. Quantitation of retained 32 P radiolabel was 
carried out using a scintillation counter, and a dominant signal for the correct nucleotide 
was clearly detected during each cycle. The details are outlined below: 

Sequencing Adaptor Generation: 

Adaptor set #1 (lower strands of this adaptor set are shown in the box below) was 

generated as follows: 20.0 nl of the lower strand of the four adaptors (100 pmole/^1) 
were added, in four separate reactions (one for each oligonucleotide) to 12.5 \xl H2O, 
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12.0 ^1 5x Terminal deoxynucleotidyl transferase buffer (500 mM cacodylate buffer, pH 
6.8, 5 mM C0CI2, 0.5 mM DTT), 3.0 fj.1 Terminal deoxynucleotidyl transferase (20U/^1; 
Promega, Madison WI) and 12.5 »l p 2 P]dATP (10.0 ixCV\xl). All of the samples were 
incubated at 37°C for one hour followed by heat inactivation at 70°C for 10 minutes. 
5 Unincorporated p 2 P]dATP was removed from each tube using a Qiagen nucleotide 
removal column (Qiagen, Chatsworth CA) and each oligonucleotide was eluted in 50 jal 
TE. 



10 



5T-CNNNCATCCGACCCAGGCGTGCG (SEQ ID NO: 13) or 
5T-ANNNCATCCGACCCAGGCGTGCG (SEQ ID NO: 14) or 
5T-TNNNCATCCGACCC AGGCGTGCG (SEQ ID NO: 1 5) or 
5'P-GNNNCATCCGACCCAGGCGTGCG (SEQ ID NO: 16) ; only the 5' end varies 
between these four oligonucleotides, and this nucleotide is underlined; the Fok\ 
recognition sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



15.8 jd of each of the first three labeled oligonucleotides were separately added 
15 to 2.5 nl 10X T 4 DNA Ligase buffer (660 mM Tris-HCl, 50 mM MgCl 2 , 10 mM 

dithioerythritol, 10 mM ATP, pH 7.5), 0.5 \A H2O and to 6.2 ^1 of the upper strand of the 
sequencing adaptor (100 pmole/fil): 



S'-CGCACGCCTGGGTCGGATG (SEQ ID NO: 17); the Fold recognition sequence is 
in bold type. 



20 The last labeled oligonucleotide (with the 5 ! G) was processed as described above, 

except in half amounts, resulting in a final volume of 25 \xl for each of the first three 

adaptors and 12.5 \xl for the final adaptor. 

Non-radiolabeled counterparts to the above four adaptors were generated by 

adding 20.0 \x\ (100 pmole/nl) of each of the first three lower strands, separately to 20.0 

25 jil (100 pmole/nl) of the upper strand, 8.0 \xl of 10X T 4 DNA Ligase buffer and 32 nl 

H2O, for a final volume of 80 and 10.0 pal (100 pmole/nl) of the final lower strand 
(with the 5* G) was added to half amounts of the above constituents, for a final volume 
of 40 |il. Each of the eight sets of adaptors (four radiolabeled and four non-radiolabeled) 
were incubated at 93°C for 30 seconds followed by annealing at 25°C for 5 minutes. 
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The radiolabeled final adaptor (with the 5* G) was added to 12.5 ^1 H2O, to bring the 
final volume to 25 pi, like the other radiolabeled adaptors, and the 40 i*l of the non- 
radiolabeled final adaptor was added to 40 \il H2O, to bring the final volume to 80 \d 9 
like the other non-radiolabeled adaptors. Each adaptor with a 5' G was at half the 
concentration of the other adaptors based on ligation data from preliminary experiments. 

Each radiolabeled adaptor was added to 25 pi of the non-radiolabeled adaptors 
with the other three 5 f ends. This resulted in four adaptor #1 mixes, each with one 
radiolabeled adaptor and the remaining three non-radiolabeled adaptors. Using four 
ligation mixtures allows one to sequence nucleotides using a single label and a simple 
detection apparatus (e.g. a scintillation counter). 

Adaptor set #2 was made the same way as adaptor set #1, except that the four 
oligonucleotides for the lower strands of the adaptors were: 

5T-CNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO: 18) or 
5'P-ANNNCATCCTCTGGGCTGCACGGG (SEQ ID NO: 19) or 
5T-TNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:20) or 
5T-GNNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:21); only the 5' end varies 
between each of these four oligonucleotides, and this nucleotide is underlined; the Fokl 
recognition sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 

and the oligonucleotide for the upper strand of the adaptors was: 
S'-CCCGTGCAGCCCAGAGGATG (SEQ ID NO:22); the Fokl recognition sequence 
is in bold type. 

Initial Sequencing Template Generation: 

PCR amplification of a 93 bp initial template precursor from human genomic 
DN A was carried out as described in Example 1 . 

The samples were combined and mixed with 400 f-il of washed magnetic 

streptavidin beads in 2x binding-wash buffer (prepared from 140 jxl Dynabeads M-280 
Streptavidin, Dynal, Oslo Norway, as directed by the manufacturer), incubated for 1 
hour at room temperature (23°C) with mixing to disperse the magnetic beads, 
magnetically pelleted (Dynal Magnetic Pellet Concentrator-E), washed three times in 
binding-wash buffer, and resuspended in 100 \il H2O. This product was digested with 
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7.0 \x\ Fok\ (3U/fil; Boehringer Mannheim) in the manufacturer's lx buffer in a total 
volume of 150 \x\ at 37°C for 1 hour, with mixing every 15 minutes, magnetically 
pelleted, washed three times in binding- wash buffer, and the template was suspended in 

50 nl H 2 0. 
Adaptor Ligation: 

The template underwent ligation separately to each of the four adaptor mixes in 
adaptor set #1 as follows: 12.5 |il of the template was added to 10 \x\ of each adaptor 
mix, 18.5 \x\ H 2 0, 4.0 ^1 10X T 4 DNA Ligase buffer, and 5.0 ^1 T 4 DNA Ligase (1 .0 

U/|il; Boehringer Mannheim, Indianapolis IN) and incubated at 23°C for 1 hour with 
mixing every 15 minutes. Then, the mixture was magnetically pelleted, the pellets were 
washed three times in binding-wash buffer and then were resuspended in 50 ^1 TE (1 0.0 
mM Tris-HCl pH 8.0, 1.0 mM EDTA). 

Scintillation Counting: 

Forty |il each of the four ligated samples were added to 2.5 ml of scintillation 
fluid (Beckman Ready Gel, Beckman Instruments, Fullerton CA) in a scintillation vial 
and underwent scintillation counting using a Beckman LS 1801 scintillation counter. 

PCR Amplification: 

One fil from each ligation (from the 10 jal remaining that did not undergo 
scintillation counting) underwent PCR amplification as was done in generating the initial 
template precursor, except that 42.6 ^1 H2O was used (instead of 41 .6 and the upper 
strand of sequencing adaptor set #1 was used as the PCR primer in place of Primer A. 

Second Sequencing Cycle: 

The steps were identical to the first sequencing cycle, except that the adaptor set 
used for adaptor ligation was adaptor set #2, and the upper strand of sequencing adaptor 
set #2 was used as a PCR primer instead of the upper strand of sequencing adaptor set 
#1. 
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Third Sequencing Cycle: 

The template precursor that had been amplified in the second sequencing cycle 
underwent binding to magnetic streptavidin, Folkl digestion, adaptor ligation, and 
5 scintillation counting as was done in the second sequencing cycle, except that the 
adaptor set used for adaptor ligation was adaptor set # 1 . 

Sequencing Results: 

The Fok\ recognition domain is positioned in each ligated adaptor so that one 
10 nucleotide was sequenced at 9 nucleotide intervals. The scintillation counts for each of 
the four adaptors at each sequencing interval (identified by sequencing cycle) is shown 
below. The highest counts are in bold type. The second adaptor set did not label as 
efficiently as the first adaptor set. Counts for the correct nucleotide were > 12 fold 
greater than background (counts for any other nucleotide) in the first three cycles. 
1 5 Counts for the correct nucleotide were dominant for cycles 4 and 5, but were less than 2- 
fold over background. 



Sequencing Cycle Number 
1 2 3 4 5 



Template nucleotide 
at ligation junction 


A 


A 


T 


T 


G 


Predicted 5* end of 
adaptor undergoing 
ligation 


T 


T 


A_ 


A 


C. 


Scintillation G 


712 


329 


1,337 


2,420 


1,597 


counts for . 
adaptors 


1,933 


344 


40,284 


3,169 


11 ,394 


(identified by j 


25,668 


6,769 


3,105 


1,404 


7,307 


»P labelled 3' 
end) C 


1,007 


366 


1,330 


242 


21,178 
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EXAMPLE 2 

Demonstration of Interval Sequencing Mediated by Class-IIS Restriction 
Endonuclease Generated 3' overhangs and Template-Directed Ligation 

A BseTil based protocol was used to sequence single nucleotides separated by 

5 intervals of eight nucleotides using a scintillation counter. The initial template precursor 
was a 103 bp PCR product containing a portion of the Cystic Fibrosis Transmembrane 
Conductance Regulator gene that had been amplified directly from human genomic 
DNA. Sequencing was accomplished by template-directed ligation using three 
sequencing cycles, and covered a span of 17 nucleotides (1 + (2 x 8) = 17). The non- 

1 0 biotinylated primer used to generate the template precursor contained a recognition 
domain for BseRl. The opposite primer had a biotinylated 5' end, and was used to bind 
the template precursor to magnetic streptavidin beads. During the sequencing cycles, 
only two sets of adaptors were used, and each unique PCR amplifying primer used 
during the sequencing cycles was identical to the upper strand of the previously used 

15 adaptor, except it did not have the final two nucleotides on the 3' end, so that these 
unique amplifying primers contained the BseRl recognition domain in their 3' ends 
ensuring sufficient length for efficient priming when using these adaptors. In this test 
protocol, identification of a nucleotide during each sequencing cycle took place using 
four ligation reactions (for the single template precursor). In each ligation, all four 

20 adaptors were present, with the 5' end of a different one of the four adaptors in each 
ligation tagged with 32 P. Quantitation of retained 32 P radiolabel was carried out using a 
scintillation counter. Signal for the correct nucleotide was four fold greater than 
background in each of the three cycles. The details are outlined below: 



25 Sequencing Adaptor Generation: 

Adaptor set #1 (upper strands of this adaptor set are shown in the box below) was 

generated as follows: 4.0 ^1 of the upper strand of the four adaptors (100 pmole/fil) were 
added, in four separate reactions (one for each oligonucleotide) to 5.0 ^1 H20, 16.0 \il 
lOx Polynucleotide Kinase buffer (700 mM Tris-HCl (pH 7.6), 100 mM MgCl 2 , 50 mM 
30 dithiothreitol), 1 0.0 jaI T4 Polynucleotide Kinase (1 OU/nl; New England BioLabs, 
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Beverly MA) and 125.0 [ 32 P]ATP (2.0 ^iCi/nl). All of the samples were incubated at 
37°C for one hour followed by heat inactivation at 65°C for 20 minutes. Unincorporated 
[ 32 P]ATP was removed from each tube using a Qiagen nucleotide removal column 
(Qiagen, Chatsworth CA) and each oligonucleotide was eluted in 50 |il TE. 

5* CGCACGGCTGGGTCGGAGGAGNC (SEQ IDNO:23) or 
5' CGCACGGCTGGGTCGGAGGAGNA (SEQ ID NO:24)or 
5' CGCACGGCTGGGTCGGAGGAGNT (SEQ ID NO:25)or 
5' CGCACGGCTGGGTCGGAGGAGNG (SEQ ID NO:26); only the 3' end varies 
between each oligonucleotide, and this nucleotide is underlined; the BseRI recognition 
sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 

The four labeled oligonucleotides (8 pmole/jal) were separately added to an equal 
volume of the lower strand of the adaptor 

(CTCCTCCGACCCAGCCGTGCG (SEQ ID NO:27); the BseW recognition sequence 
is in bold type. 

15 suspended in 2X T4 DNA Ligase buffer (8 pmole/|il). Non-radiolabeled counterparts to 
the above four adaptors were generated as follows: Unlabeled upper strands of the 
adaptors (8 pmole/^1) were added, separately, to an equal volume of the lower strand of 
the adaptor suspended in 2X T4 DNA Ligase buffer (8 pmole/|il). Each of the eight sets 
of adaptors (four radiolabeled and four non-radiolabeled) were incubated at 93°C for 30 

20 seconds followed by annealing at 25°C for 5 minutes. Five \i\ of each radiolabeled 
adaptor was added to 5 \i\ of those non-radiolabeled adaptors with the other three 3' 
ends. This resulted in four adaptor #1 mixes, each with one radiolabeled adaptor and the 
remaining three non-radiolabeled adaptors. 

Adaptor set #2 was made the same way as adaptor set #1 , except that the four 

25 oligonucleotides for the upper strands of the adaptors were: 



5 



10 



WO 99/45153 



81 



PCT/US99/04883 



5' GGTGCGCCAGTCCAGCGAGGAGNC (SEQ ID NO:28)or 

5' GGTGCGCCAGTCCAGCGAGGAGNA (SEQ ID NO:29)or 

5' GGTGCGCCAGTCCAGCGAGGAGNT (SEQ ID NO:30)or 

5' GGTGCGCCAGTCCAGCGAGGAGNG (SEQ ID NO:31); only the 3' end varies 

between each oligonucleotide, and this nucleotide is underlined; the BseRl recognition 

sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



The oligonucleotide for the lower strand of the adaptors was: 



10 



(CTCCTCGCTGGACTGGCGCACC (SEQ ID NO:32); the BseKl recognition 
sequence is in bold type. 



Initial Sequencing Template Generation: 

PCR amplification of a 1 03 bp initial template precursor from human genomic 
DNA was carried out as in Example 1, except that Primer A had the following sequence: 



15 



5 , T CTGTTCTCAGTTTTCCTGGATGAGGAGTGGCACC (SEQ ID NO:33); 
mismatches to genomic DNA underlined; BseRl recognition sequence in bold type. 



The samples were combined, and the 400 nl was digested with 5.0 pi BseW 
(4U/|il; New England BioLabs) in the manufacturer's lx buffer in a total volume of 460 
III at 37°C for 1 hour followed by heat inactivation at 65°C for 20 minutes. This product 
was mixed with 460 |il of washed magnetic streptavidin beads (140 pi Dynabeads 
20 washed and then suspended in 2x binding-wash buffer following the manufacturer's 
instructions), incubated for 1 hour at room temperature (23°C) with mixing to disperse 
the magnetic beads, magnetically pelleted (Dynal Magnetic Pellet Concentrator-E), 
washed three times in binding-wash buffer, and resuspended in 50 pi TE. 



25 Adaptor Ligation: 

The template underwent ligation separately to each of the four adaptor mixes in 

adaptor set #1 as follows: 12.5 \i\ of the template was added to 20 pi of each adaptor 
mix, 9.5 \xl H 2 0, 3.0 kiI 10X T 4 DNA Ligase buffer, and 5.0 \il T 4 DNA Ligase (1.0 
U/|il; Boehringer Mannheim, Indianapolis IN) and incubated at 23°C for 1 hour with 
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mixing every 1 5 minutes . Then, the mixture was magnetically pelleted, and the pellets 
were washed three times in binding- wash buffer and then were resuspended in 50 \il TE. 

Scintillation Counting: 

5 Twenty five pi of each of the four ligated samples was added to 2.5 ml of 

scintillation fluid (Beckman Ready Gel) in a scintillation vial and underwent scintillation 
counting using a Beckman LS 1801 scintillation counter. 

PCR Amplification: 

10 One \x\ from each ligation (of the 10 jal remaining that did not undergo 

scintillation counting) underwent PCR amplification as was done in generating the initial 
template precursor, except that 42.6 \i\ H2O was used (instead of 41 .6 fil) and 

5' CGCACGGCTGGGTCGGAGGAG (SEQ ID NO:34); BseKI recognition sequence is 
in bold type. 

1 5 was used as the PCR primer in place of Primer A. 



Second Sequencing Cycle: 

The steps were identical to the first sequencing cycle, except that the adaptor set 
used for adaptor ligation was adaptor set #2, and 



20 



5' GGTGCGCCAGTCCAGCGAGGAG (SEQ ID NO:35); BseKl recognition sequence 
is in bold type. 



was used as the PCR primer replacing primer A. 



Third Sequencing Cycle: 

25 The template precursor that had been amplified in the second sequencing cycle 

underwent BseRl digestion, binding to magnetic streptavidin, adaptor ligation and 
scintillation counting as was done in the second sequencing cycle, except that the 
adaptor set used for adaptor ligation was adaptor set #1 . 
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Sequencing Results: 

The BseKl recognition domain is positioned in each ligated adaptor so that one 
nucleotide was sequenced at 8 nucleotide intervals. The initial template precursor is 
shown below, along with its BseRl recognition domain (bold type). Underlined 

5 sequences are the original amplifying primers (Primer A and Primer B). The cut sites 
for this recognition domain, as well as subsequent cut sites directed by ligated adaptors, 
are shown by dissecting lines. Cleavage generates a single-strand overhang that 
constitutes a template, and the nucleotide sequenced at each interval is shown by a 
numbered asterisk, the number identifying the sequencing cycle for sequencing the 

10 nucleotide. 

S ' - TCTCTTCTCA UrnT CCTGGAT QAQQAG TGGCACC ATT| AAAGAAAA {rATCATCT flTOGTGTT- 2 0 bp - GATACAG AAG CGTCATC AAAG CATG - 3 1 

3 ' - AG ACAAGAGTCAAAAGG ACCTA CTCCTC ACCGTGGT (AATTTCTT pTATAGTA [gAAACCACAA- 2 Obp- CTATGTCTTCGCAGTAGTTTCGTAC - Biot in 

II M #3 

The scintillation counts for each of the four adaptors at each sequencing interval 
(identified by sequencing cycle) is shown below. The highest counts are in bold type. 
1 5 Signal for the correct nucleotide was four fold greater than background in each of the 
three cycles. 

Sequencing Cycle Number 





1 


2 


3 


Template nucleotide 
at ligation junction 


A 


T 


A 


Predicted 3' end of 
adaptor undergoing 
ligation 


1 


A_ 


T 


Scintillation G 


146,170 


111,660 


100,550 


counts for . 
adaptors 


130,570 


507,140 


32,023 


(identified by j 


1,290,660 


83,787 


668,140 


phophorylated — 
5' end) C 


209,660 


95,120 


51,515 
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This invention was also tested to see whether it could detect a heterozygote for the cystic 
fibrosis delta 508 mutation. In this carrier, one would expect the third cycle to detect 
both an A and a C (ligation of adaptors with a 3' T or G). In this test, all adaptors with a 
3' G were at half the concentration used previously, since the adaptors with a 3' G tended 
5 to give higher background counts, and following the sequencing of the initial template, 
templates were diluted 1 :10 prior to PCR amplification. The results are shown below: 

Sequencing Cycle Number 





1 


2 


3 


Template nucleotide 
at ligation junction 


A 


T 


AandC 


Predicted 3 ' end of 
adaptor undergoing 
ligation 


T 


A_ 


TandG 


Scintillation G 


38,430 


42,824 


102,340 


counts for . 
adaptors — 


77,540 


198,350 


10,968 


(identified by j 


598,840 


40,092 


110,640 


phopborylated 
5' end) C 


125,320 


47,620 


21,430 



The heterozygote was clearly detected with counts four fold higher for each of the two 
10 predicted nucleotides over the background counts for the other nucleotides. 

EXAMPLE 3 

Demonstration of Interval Sequencing Template Generation Mediated by Class-IIS 
Restriction Endonuclease Generated 5 f overhangs, Template-Directed 
15 Polymerization and Adaptor Ligation 

A Fokl based protocol was used to generate a series of templates separated by 
intervals of nine nucleotides. The initial template precursor was the identical 93 bp PCR 
product that was used as the initial template precursor in Example 1 . During the 
sequencing cycles, only two adaptors were used, and each unique PCR amplifying 
20 primer used during the sequencing cycles was identical to the upper strand of the 
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previously used adaptor. In this test protocol, sequencing was simulated by the 
incorporation of a ddNTP into the template during five sequencing cycles, and 
successful trimming of the template was confirmed by acrylamide gel resolution of the 
PCR products constituting the template precursors during each simulated sequencing 
cycle. The template was trimmed as predicted over the five sequencing cycles. The 
details are given below: 



Sequencing Adaptor Generation: 

Adaptor #1 was generated as follows: 

10 30 nl of the lower strand of adaptor #1 (1 00 pmole/fil): 



5' NNNCATCCGACCCAGGCGTGCG (SEQ ID NO:36); the Fokl recognition 
sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



and 30 ^1 of the upper strand of adaptor #1 (100 pmole/nl): 



15 



5' CGCACGCCTGGGTCGGATG (SEQ ID NO:37); the Fokl recognition sequence is 
in bold type. 



were added to 12 \il H2O and to 8.0 ^1 10X T4 DNA Ligase buffer. The adaptor was 
incubated at 93°C for 30 seconds followed by annealing at 25°C for 5 minutes. 

Adaptor #2 was made the same way as adaptor set #1 , except that the 
oligonucleotide for the lower strand of adaptor #2 was: 



20 



5' NNNCATCCTCTGGGCTGCACGGG (SEQ ID NO:38); the Fokl recognition 
sequence is in bold type; N represents nucleotides with 4-fold degeneracy. 



and the oligonucleotide for the upper strand of the adaptors was: 



5 f CCCGTGCAGCCCAGAGGATG (SEQ ID NO:39); the Fokl recognition sequence is 
in bold type. 



25 



Initial Sequencing Template Generation: 

PCR amplification of a 93 bp initial template precursor from human genomic 
DNA was carried out as described in Example 1 , except that only 1 00 jil (one tube) was 
amplified. Following PCR amplification, 50 |ixl was removed to be run on a acrylamide 
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gel later. The remaining 50 p\ was mixed with 100 \i\ of washed magnetic streptavidin 
beads (16 \xl Dynabeads M-280 Streptavidin washed and suspended in 2x binding-wash 
buffer) and 50 1*1 H2O, incubated for 1 hour at 23°C with mixing, magnetically pelleted, 
washed three times in binding-wash buffer, and resuspended in 50 \d H2O. This product 
5 was digested with 1 .0 ^1 Fokl (3U/nl) with mixing every 15 minutes in the Ix restriction 
endonuclease buffer in a total volume of 100 \xl at 37°C for 1 hour, magnetically 
pelleted, washed three times in binding-wash buffer, and resuspended in 25 \x\ H2O. 

Template Directed Polymerization Using Nucleotide Terminators: 

10 This product was added to 10 \i\ of each ddNTP (500 \xU each), 14 |al H 2 0, 20 \l\ 

5x Sequenase buffer, and 1.0 (il Sequenase (Amersham) and was incubated at 23 °C for 
20 minutes with mixing every 1 0 minutes. The mixture was magnetically pelleted, 
washed three times in binding-wash buffer and suspended in 25 nl TE. 

1 5 Adaptor Ligation : 

The template (following simulated sequencing by ddNTP fill-in) underwent 
ligation to adaptor #1 as follows: 25 ^1 of the template was added to 10 ^1 of adaptor #1 , 
6.0 til H20, 4.0 \x\ 10X T4 DNA Ligase buffer, and 5.0 \x\ T4 DNA Ligase (1 .0 \3l\x\) 
and incubated at 23°C for 1 hour with mixing every 1 5 minutes. Then, the mixture was 
20 magnetically pelleted, washed three times in binding-wash buffer, and suspended in 50 
^1 TE. 

PCR Amplification: 

1 |xl from the ligation underwent PCR amplification as was done in generating 
25 the initial template precursor, except that 42.6 fil H2O was used (instead of 41 .6 and 
the upper strand of adaptor #1 was used as the PCR primer in place of Primer A. 



10 
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Second Sequencing Cycle: 

The steps were identical to the first sequencing cycle, except that the adaptor 
used for adaptor ligation was adaptor #2, and the upper strand of adaptor #2 was used as 
a PCR primer instead of the upper strand of adaptor #1 . 

Third Sequencing Cycle: 

Identical to the second sequencing cycle, except that the adaptor used for adaptor 
ligation was adaptor # 1 , and the upper strand of adaptor #1 was used as a PCR primer 
instead of the upper strand of adaptor #2. 



Subsequent Sequencing Cycles: 

Following the third sequencing cycle, the second sequencing cycle was repeated, 
and following this second sequencing cycle, the third sequencing cycle was repeated. 

15 Results: 

Following each PCR amplification, generating the template precursors, 50 \il 
were removed and were later run on a acrylamide gel, as shown in Figure 5. Following 
the sequencing cycles 1 - 5, the template precursor was trimmed as predicted, with high 
specificity in the first four sequencing cycles, and some extraneous product in the 
20 template-precursor following the fifth sequencing cycle. 



EXAMPLE 3B 

Demonstration of Interval Sequencing Mediated by Class-IIS Restriction 
Endonuclease Generated 5' overhangs, Template-Directed Polymerization and 
25 Adaptor Ligation 

This example is essentially the same as Example 3, except that during each 
template-directed polymerization with ddNTPs, a 33 P labeled ddNTP was substituted for 
its corresponding normal ddNTP, in four separate template-directed polymerizations, 
each with a single and different radiolabeled ddNTP. Then, an aliquot from each of 
30 these reactions underwent scintillation counting. 
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Sequencing Adaptor Generation: 

Sequencing adaptor generation was carried out as described in Example 3. 

Initial Sequencing Template Generation: 

PCR amplification of the initial template precursor from human genomic DNA 
was carried out as described in Example 3, except that two tubes were amplified (200 
\x\). Following PCR amplification, the entire PCR product was bound to 200 \i\ of 
washed magnetic streptavidin beads (64 jd Dynabeads M-280 Streptavidin washed and 
suspended in 2x binding-wash buffer), incubated for 1 hour at 23 °C with mixing, 
magnetically pelleted, washed three times in binding-wash buffer, and resuspended in 
100 nl H2O. This product was digested with 4.0 \A Fokl (3U/nl) in the corresponding lx 
restriction endonuclease buffer in a total volume of 1 50 \x\ at 37°C for 1 hour with 
mixing every 15 minutes, magnetically pelleted, washed three times in binding- wash 
buffer, and resuspended in 100 \A H2O. 

Template Directed Polymerization using Nucleotide Terminators: 

25 \A underwent four separate template directed polymerizations using ddNTPs, 
each exactly as was done in Example 3, except a different three non-radiolabeled 
ddNTPs were added in each reaction, with the fourth ddNTP being 5.0 nl of the 
corresponding 33 PddNTP (0.45 \xC\l\A\ Amersham). Also, 19 \i\ H2O were used instead 
of 14 fil H2O, and 3U of Sequenase (1 .2 ^ of a 1 :5 dilution in lx Sequenase buffer) 
were used instead of 1 \x\ of undiluted Sequenase (13U/nl). Following incubation for 20 
minutes at 23 °C with mixing every 10 minutes, each mixture was magnetically pelleted, 
washed three times in binding-wash buffer and suspended in 50 \x\ H2O. 

Scintillation Counting: 

40 nl underwent scintillation counting as described in Example 1 . 

Adaptor Ligation: 

The remaining 10 nl of each of the four samples were combined, and underwent 
adaptor ligation as in Example 3, except that 10 \il of lOx ligase buffer and 35 ^ H2O 
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were used, resulting in a final volume of 100 nl, and following ligation, magnetic 
pelleting and washing, the pellet was suspended in 25 \A TE. 



PCR Amplification: 

One nl from the ligation underwent PCR amplification in each of two tubes as 
was done in generating the initial template precursor, except that 42.6 nl H2O was used 
(instead of 41.6 and the upper strand of adaptor #1 was used as the PCR primer in 
place of Primer A. 



1 0 Second Sequencing Cycle: 

The steps were identical to the first sequencing cycle, except that the adaptor 
used for adaptor ligation was adaptor #2, and the upper strand of adaptor #2 was used as 
a PCR primer instead of the upper strand of adaptor #1 . 



1 5 Third Sequencing Cycle: 

Identical to the second sequencing cycle, except that the adaptor used for adaptor 
ligation was adaptor #1, and the upper strand of adaptor #1 was used as a PCR primer 
instead of the upper strand of adaptor #2. 



20 Subsequent Sequencing Cycles: 

Following the third sequencing cycle, the second sequencing cycle was repeated, 
and following this second sequencing cycle, the third sequencing cycle was repeated 
through the scintillation counting step. 



25 Sequencing Results; 

The scintillation counts at each sequencing interval (identified by sequencing 
cycle) are shown below. The highest counts are in bold type. Counts for the correct 
nucleotide were greater than 3.50 fold greater than background (counts for any other 
nucleotide) in each of the five cycles. 

30 
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Sequencing Cycle Number 
1 2 3 4 5 



Template nucleotide 
adjacent to double- 
stranded domain 


A 


A 


T 


T 


G 


Predicted ddNTP 
incorporated by 
template-directed 
polymerization 


21 


T 


• 

A_ 


A 


C_ 


Scintillation 


51,444 


20,848 


74,217 


261 ,280 


12,436 


counts for . 
incorporated 


255,340 


58,063 


3,433,960 


2,805,872 


167,928 


33 P labelled J 


897,960 


2,061,827 


9|434 


43,309 


229,760 


ddNTPs 

C 


13,124 


7,490 


7,877 


18,042 


886,184 



EXAMPLE 4 

5 This example demonstrates a method that uses restriction endonuclease digestion 

to selectively remove primer directed sequence from a PCR product, without using a free 
methylated nucleotide during PCR amplification. This demonstration is the first use of a 
PCR primer with a methylated recognition domain sequence designed to permit selective 
cleavage directed by the primer encoded end of a PCR product. In the context of the 
10 sequencing method of this invention, when generating initial sequencing templates, the 
ability to remove PCR primer encoded sequence and its complement at the end to be 
sequenced decreases the number of cycles necessary to sequence PCR product that lies 
beyond the primer. 

There is currently only one commercially available restriction endonuclease, Dpn 
1 5 I, that requires a methylated sequence for cutting. Dpn I recognizes the sequence 
GATC, where the A is methylated. Cutting by Dpn I generates a blunt end. The 
methylated A was incorporated into the primer sequence during routine oligonucleotide 
synthesis, as methyl A is commercially available as a phosphoramidite. PCR 
amplification occurred using regular non-methylated nucleotides, so no portion of any 
20 PCR product, apart from the methylated primer, was methylated. A 55 bp PCR product 
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was amplified from the plasmid pUC19. This 55 bp PCR product and its 40 bp Dpn I 
digest product are illustrated in Figure 6, and the denaturing acrylamide gel showing the 
original PCR product and its Dpnl digestion product is shown in Figure 7. 

5 PCR Product Generation with a Primer Encoded Hemi-Methy lated Dpn I 
Recognition Domain: 

PCR amplification of a 55 bp product from 4 ng of the plasmid pUC19 was 
carried out using 1 .6 \x\ iTth DNA Polymerase (2.5 U/^l; Perkin Elmer) in a IxTth DNA 
polymerase buffer (20 mM Tricine pH 8.7, 85 mM KOAc, 8% glycerol, 2% (vol/vol) 
10 DMSO, 1 .1 mM Mg(OAc) 2 ), and 200 \xM each dNTP with 25 pmoles of each of the 
primers shown in the box below, using the following parameters: 94°C for 1 minute 
followed by 30 thermal cycles (94°C for 30 seconds, 45°C for 30 seconds), a final 
extension at 72°C for 7 minutes, and a 4°C soak. 



15 



Primer A: S'CCATCCGTAAGATGATCTTCTG (SEQ ID NO:40); mismatches to 
pUC19 DNA underlined; Dpnl recognition sequence in bold type. The A was 
methylated, and was incorporated during oligonucleotide synthesis using a methylated 
phosphoramidite (Glenn Research). Primer B: 5'CTCAGAATGACTTGGTTG (SEQ ID 
NO:41). 



20 Digestion with Dpnl: 

33 nl of this product was digested with 1 .0 nl or 5.0 \x\ Dpnl (20U/|J; New 
England BioLabs) in the manufacturer's lx buffer in a total volume of 40 nl at 37°C for 1 
hour. The initial PCR product and its Dpnl cut portions were each run on a denaturing 
acrylamide gel, as shown in Figure 7. Dpn I cut the PCR end to very near completion 

25 (Figure 7). In this example, the Dpnl site was created near the 3* end of the primer, and 
incorporating this recognition domain required two mismatches to the original template. 
This illustrates that Dpn I, with its short 4 bp recognition domain, can be readily 
incorporated near the 3' end of a primer without preventing PCR amplification. For the 
sequencing of inserts cloned in a vector insert, the recognition domain can be placed in 

30 the immediate 3' end of the amplifying primer, because its nucleotide sequence can be 
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encoded in the vector adjacent to the inserts to be sequenced. Following digestion with 
Dpnl, an end is generated that can be ligated to the initial adaptors with offset 
recognition domains for the class-IIS restriction endonuclease used in sequencing the 
insert. 

5 

Equivalents 

Those skilled in the art will be able to recognize, or be able to ascertain using no 
more than routine experimentation, numerous equivalents to the specific procedures 
described herein. Such equivalents are considered to be within the scope of this 
10 invention and are covered by the following claims. 
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Claims: 

1 . A method for identifying a first nucleotide n and a second nucleotide n + x in a 
double stranded nucleic acid segment, comprising: 
5 a) digesting said double stranded nucleic acid segment with a restriction enzyme 

to produce a double stranded molecule having a single stranded overhang sequence 

» 

corresponding to an enzyme cut site; 

b) providing an adaptor having a cycle identification tag, a restriction enzyme 
recognition domain, a sequence identification region, and a detectable label; 
1 0 c) hybridizing said adaptor to said double stranded nucleic acid having said 

single-stranded overhang sequence to form a ligated molecule; 

d) identifying said nucleotide n by identifying said ligated molecule; 

e) amplifying said ligated molecule from step (d) with a primer specific for said 
cycle identification tag of said adaptor; and 

15 f) repeating steps (a) through (d) on said amplified molecule from step (e) to 

yield the identity of said nucleotide n + x, 

wherein x is less than or equal to the number of nucleotides between a 
recognition domain for a restriction enzyme and an enzyme cut site. 

20 2. The method of claim 1 , wherein said enzyme cut site is the cut site located the 
farthest away from said recognition domain. 

3. The method of claim 1, wherein said restriction enzyme of step (a) is a class-IIS 
restriction endonuclease. 
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4. The method of claim 3, wherein said class-IIS restriction endonuclease is 
selected from the group consisting of AccBSI, Acelll, Acil, AclWI, Alwl, Alw26I, 
AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, AspSOHI, AsuHPI, Bael, 
Bbsl, Bbvl, BbvII, Bbvl6II, Bce83I, Bcefl, Bcgl, Bco5I, Bcol 161 BcoKI, BinI, BH736I, 

5 Bpil, Bpml, BpulOI, BpuAI, Bsal, BsaMI, Bsc9II, BscAI, BscCI, Bsell, Bse3DI, BseNI, 
BseRI, BseZI, Bsgl, Bsil, BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, 
BspIS4I, BspKTSI, BspLUl lffl, BspMl, BspPI, BspSTSI, BspTS514I, BsrI, BsrBI, 
BsrDI, BsrSl, BssSI, Bstl II, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTSSI, 
Bsu6I, Cjel, CjePI, Eaml 1041, Earl, Eco31I, Eco57I, EcoA4I, Eco044I, Esp3I, Faul, 

10 Fokl, Gdill, Gsul, Hgal, HphI, Ksp632I, MboII, Mlyl, Mmel, Mnll, Mval2691, Phal, 
Piel, RleAI, Sapl, SfaNI, SimI, StsI, Taqll, TspII, TspRI, Tthl 1 III, and VpaK32I. 

5. The method of claim 1 , wherein a nucleic acid ligase is used to attach at least one 
strand of said restriction enzyme recognition domain of step (b) to said nucleic acid 

15 segment. 

6. The method of claim 1 , wherein said method further comprises blocking an 
enzyme recognition domain lying outside said enzyme recognition domain of step (b). 

20 7. The method of claim 6, wherein said blocking occurs through an in vitro primer 
extension. 

8. The method of claim 7, wherein said in vitro primer extension is DNA 
amplification in vitro. 

25 

9. The method of claim 8, wherein said DNA amplification in vitro occurs during 
said amplification in step (e). 

10. The method of claim 7, wherein said in vitro primer extension occurs following 
30 said amplification in step (e). 
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1 1 . The method of claim 7, wherein said method further comprises hemi-methylating 
an enzyme recognition domain lying outside said enzyme recognition domain of step 
(b). 

5 12. The method of claim 1 1 , wherein said hemi-methylation occurs through an 
in vitro primer extension using a primer having a portion of said enzyme recognition 
domain that blocks enzyme recognition if it is hemi-methylated. 

13. The method of claim 12, wherein said primer extension occurs with a methylated 
10 nucleotide. 

14. The method of claim 7, wherein said restriction endonuclease recognizes a hemi- 
methylated recognition domain, and the primer contains at least one methylated 
nucleotide in a methylated portion of said recognition domain. 

15 

15. The method of claim 1, wherein said nucleic acid segment is a genomic DNA. 

16. The method of claim 1, wherein said nucleic acid segment is a cDNA. 

17. The method of claim 1, wherein said nucleic acid segment is a product of an 
20 in vitro DNA amplification. 

1 8. The method of claim 1, wherein said nucleic acid segment is a PCR product. 

19. The method of claim 1 , wherein said nucleic acid segment is a product of a 
25 strand displacement amplification. 

20. The method of claim 1 , wherein said nucleic acid segment is a vector insert. 

21 . The method of claim 1, wherein said detectable label is selected from the group 
30 consisting of one or more fluorescent, near infra-red, radionucleotide and 

chemilluminescent labels. 
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22. The method of claim 1 , wherein said nucleic acid segment is attached to a solid 
matrix. 

5 23 . The method of claim 22, wherein said solid matrix is a magnetic streptavidin. 

24. The method of claim 22, wherein said solid matrix is a magnetic glass particle. 

25. The method of claim 1 , wherein said adaptor of step (b) is attached to a solid 
10 matrix. 

26. The method of claim 25, wherein said solid matrix is a magnetic streptavidin. 

27. The method of claim 25, wherein said solid matrix is a magnetic glass particle. 

15 

28. A method for sequencing an interval within a double stranded nucleic acid 
segment by identifying a first nucleotide n and a second nucleotide n + x in a plurality of 
staggered double stranded molecules produced from said double stranded nucleic acid 
segment, comprising: 

20 a) attaching an enzyme recognition domain to different positions along said 

double stranded nucleic acid segment within an interval no greater than the distance 
between a recognition domain for a restriction enzyme and an enzyme cut site, such 
attachment occurring at one end of said double stranded nucleic acid segment; 

b) digesting said double stranded nucleic acid segment with a restriction enzyme 
25 to produce a plurality of staggered double stranded molecules each having a single 

stranded overhang sequence corresponding to said cut site; 

c) providing an adaptor having a restriction enzyme recognition domain, a 
sequence identification region, and a detectable label; 

d) hybridizing said adaptor to said double stranded nucleic acid having said 
30 single-stranded overhang sequence to form a ligated molecule; 
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e) identifying a nucleotide n within a staggered double stranded molecule by 
identifying said ligated molecule; 

f) repeating steps (b) through (e) to yield the identity of said nucleotide n + x in 
each of said staggered double stranded molecules having said single strand overhang 

5 sequence thereby sequencing an interval within said double stranded nucleic acid 
segment, 

< 

wherein x is greater than one and no greater than the number of nucleotides 
between a recognition domain for a restriction enzyme and an enzyme cut site. 

1 0 29. The method of claim 28, wherein said enzyme cut site is the cut site located the 
farthest away from said recognition domain. 

30. The method of claim 28, wherein said restriction enzyme of step (b) is a class-US 
restriction endonuclease. 

15 

3 1 . The method of claim 30, wherein said class-IIS restriction endonuclease is 
selected from the group consisting of AccBSI, Acelll, Acil, AclWI, Alwl, Alw26I, 
AlwXI, Asp26HI, Asp27HI, Asp35Hl, Asp36HI, Asp40HI, Asp50HI, AsuHPI, Bael, 

• Bbsl, Bbvl, BbvII, Bbvl6II, Bce831, Bcefl, Bcgl, Bco5I, Bcol 161 BcoKI, BinI, BH736I, 
20 Bpil, Bpml, Bpul 01, BpuAI, Bsal, BsaMl, Bsc9II, BscAI, BscCI, Bsell, Bse3DI, BseNI, 
BseRI, BseZI, Bsgl, Bsil, BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, 
BspIS4I, BspKT5I, BspLUl mi, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, 
BsrDI, BsrSI, BssSI, Bstl II, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTS5I, 
Bsu6I, Cjel, CjePI, Eaml 1041, Earl, Eco3 II, Eco57I, EcoA4I, EcoOMI, Esp3I, Faul, 
25 Fokl, Gdill, Gsul, Hgal, Hphl, Ksp632I, Mbon, Mlyl, Mmel, Mnl I, Mval 2691, Phal, 
Piel, RleAI, Sapl, SfaNI, Siml, StsI, Taqll, TspII, TspRI, Tthl 1 III, and VpaK32I. 



30 



32. The method of claim 28, wherein a nucleic acid ligase is used to attach at least 
one strand of said restriction enzyme recognition domain of step (c) to said nucleic acid 
segment. 
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33. The method of claim 28, wherein said method further comprises blocking an 
enzyme recognition domain lying outside said enzyme recognition domain of step (c). 

34. The method of claim 33, wherein said method further comprises methylating an 
enzyme recognition domain lying outside said enzyme recognition domain of step (c). 

* 

35. The method of claim 34, wherein said methylation occurs through in vitro 
reaction with a methylase that recognizes the enzyme recognition domain of step (c). 

36. The method of claim 35, wherein said methylase is a Fokl methylase. 

37. The method of claim 33, wherein said blocking occurs through an in vitro primer 
extension. 

38. The method of claim 37, wherein said in vitro primer extension is DNA 
amplification in vitro. 

39. The method of claim 37, wherein said method further comprises hemi- 
mythylating an enzyme recognition domain lying outside said enzyme recognition 
domain of step (c). 

40. The method of claim 39, wherein said hemi-methylation occurs through an 
in vitro primer extension using a primer having a portion of said enzyme recognition 
domain that blocks enzyme recognition if it is hemi-methylated. 

41 . The method of claim 40, wherein said primer extension occurs with a methylated 
nucleotide. 

42. The method of claim 37, wherein said restriction endonuclease recognizes a 
hemi-methylated recognition domain, and the primer contains at least one methylated 
nucleotide in a methylated portion of said recognition domain. 
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43. The method of claim 28, wherein said nucleic acid segment is a genomic DNA. 

44. The method of claim 28, wherein said nucleic acid segment is a cDNA. 

5 

45. The method of claim 28, wherein said nucleic acid segment is a product of an 
in vitro DNA amplification. 

46. The method of claim 28, wherein said nucleic acid segment is a PGR product. 

10 

47. The method of claim 28, wherein said nucleic acid segment is a product of a 
strand displacement amplification. 

48. The method of claim 28, wherein said nucleic acid segment is a vector insert. 

15 

49. The method of claim 28, wherein said detectable label is selected from the group 
consisting of one or more fluorescent, near infra-red, radionucleotide and 
chemilluminescent labels. 

20 50. The method of claim 28, wherein said nucleic acid segment is attached to a solid 
matrix. 

5 1 . The method of claim 50, wherein said solid matrix is a magnetic streptavidin. 

25 52. The method of claim 50, wherein said solid matrix is a magnetic glass particle. 

53. The method of claim 28, wherein said adaptor of step (c) is attached to a solid 
matrix. 

30 54. The method of claim 53, wherein said solid matrix is a magnetic streptavidin. 
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55. The method of claim 53, wherein said solid matrix is a magnetic glass particle. 

56. A method for identifying a first nucleotide n and a second nucleotide n + x in a 
double stranded nucleic acid segment, comprising: 

5 a) digesting said double stranded nucleic acid segment with a restriction enzyme 

to produce a double stranded molecule having a 5' single stranded overhang sequence 

corresponding to an enzyme cut site; 

b) identifying said nucleotide n by template-directed polymerization with a 

labeled nucleotide or nucleotide terminator; 
10 c) providing an adaptor having a cycle identification tag and a restriction enzyme 

recognition domain; 

d) ligating said adaptor to said double stranded nucleic acid to form a ligated 
molecule; 

e) amplifying said ligated molecule from step (d) with a primer specific for said 
1 5 cycle identification tag of said adaptor; and 

f) repeating steps (a) through (b) on said amplified molecule from step (e) to 
yield the identity of said nucleotide n + x, 

wherein x is less than or equal to the number of nucleotides between a 
recognition domain for a restriction enzyme and an enzyme cut site. 

20 

57. The method of claim 56, wherein said enzyme cut site is the cut site located the 
farthest away from said recognition domain. 

58. The method of claim 56, wherein said restriction enzyme of step (a) is a class-IIS 
25 restriction endonuclease. 
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59. The method of claim 58, wherein said class-IIS restriction endonuclease is 
selected from the group consisting of AccBSl, Acelll, Acil, AclWI, Alwl, Alw26I, 
AlwXl, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40Hl, AspSOHI, AsuHPI, Bael, 
Bbsl, Bbvl, BbvII, Bbvl 611, Bce83I, Bcefl, Bcgl, BcoSI, Bcol 161 BcoKI, BinI, BH736I, 

5 Bpil, Bpml, BpulOI, BpuAI, Bsal, BsaMI, Bsc9II, BscAI, BscCI, Bsell, Bse3DI, BseNI, 
BseRI, BseZI, Bsgl, Bsil, BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, 
BspIS4I, BspKT5I, BspLUl 1 III, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, 
BsrDI, BsrSI, BssSI, Bstl II, Bst71I, Bst2BI, BstBS321, BstD102I, BstF5I, BstTS5I, 
Bsu6I, Cjel, CjePI, Eaml 1041, Earl, Eco31I, Eco57I, EcoA4I, Eco044I, Esp3I, Faul, 

10 Fokl, Gdill, Gsul, Hgal, HphI, Ksp632I, MboII, Mlyl, Mmel, Mnll, Mval269I, Phal, 
Piel, RleAI, Sapl, SfaNI, SimI, StsI, Taqll, TspII, TspRI, Tthl 1 HI, and VpaK32I. 

60. The method of claim 56, wherein a nucleic acid ligase is used to attach at least 
one strand of said restriction enzyme recognition domain of step (c) to said nucleic acid 

1 5 segment. 



61 . The method of claim 56, wherein said method further comprises blocking an 
enzyme recognition domain lying outside said enzyme recognition domain of step (c). 

20 62. The method of claim 61 , wherein said blocking occurs through an in vitro primer 
extension. 

63. The method of claim 62, wherein said in vitro primer extension is DNA 
amplification in vitro. 

25 

64. The method of claim 63, wherein said DNA amplification in vitro occurs during 
said amplification in step (e). 



65. The method of claim 62, wherein said in vitro primer extension occurs following 
30 said amplification in step (e). 
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66. The method of claim 62, wherein said method further comprises hemi- 
methylating an enzyme recognition domain lying outside said enzyme recognition 
domain of step (c). 

5 67. The method of claim 66, wherein said hemi-methylation occurs through an in 
vitro primer extension using a primer having a portion of said enzyme recognition 
domain that blocks enzyme recognition if it is hemi-methylated. 

68. The method of claim 67, wherein said primer extension occurs with a methylated 
1 0 nucleotide. 

69. The method of claim 62 wherein said restriction endonuclease recognizes a hemi- 
methylated recognition domain, and the primer contains at least one methylated 
nucleotide in a methylated portion of said recognition domain. 

15 

70. The method of claim 56, wherein said nucleic acid segment is a genomic DNA. 

71. The method of claim 56, wherein said nucleic acid segment is a cDNA. 

20 72. The method of claim 56, wherein said nucleic acid segment is a product of an in 
vitro DNA amplification. 

73. The method of claim 56, wherein said nucleic acid segment is a PGR product. 

25 74. The method of claim 56, wherein said nucleic acid segment is a product of a 
strand displacement amplification. 

75. The method of claim 56, wherein said nucleic acid segment is a vector insert. 

30 76. The method of claim 56, wherein said label is selected from the group consisting 
of one or more fluorescent, near infra-red, radionucleotide and chemilluminescent labels. 
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77. The method of claim 56, wherein said nucleic acid segment is attached to a solid 
matrix. 

78. The method of claim 77, wherein said solid matrix is a magnetic streptavidin. 

5 

79. The method of claim 77, wherein said solid matrix is a magnetic glass particle. 

80. The method of claim 56, wherein said adaptor of step (c) is attached to a solid 
matrix, 

10 

81 . The method of claim 80, wherein said solid matrix is a magnetic streptavidin. 

82. The method of claim 80, wherein said solid matrix is a magnetic glass particle. 

15 83. The method of claim 56, wherein said step (a) is modified to generate a blunt end 
in said nucleic acid segment. 

84. The method of claim 83, wherein said step (b) is modified to identify a 
nucleotide in said blunt end of said nucleic acid segment by using a 3' exonuclease 

20 activity of a DNA polymerase to generate a single nucleotide long single-stranded 
nucleic acid template. 

85. The method of claim 84, said method further comprising sequencing said 
nucleotide by a template-directed polymerization with a labeled nucleotide or nucleotide 

25 terminator. 



86. The method of claim 85, wherein said template-directed polymerization is 
followed by identification of an incorporated label. 
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87. A method for sequencing an interval within a double stranded nucleic acid 
segment by identifying a first nucleotide n and a second nucleotide n + x in a plurality of 
staggered double stranded molecules produced from said double stranded nucleic acid 
segment, comprising: 

5 a) attaching an enzyme recognition domain to different positions along said 

double stranded nucleic acid segment within an interval no greater than the distance 
between a recognition domain for a restriction enzyme and an enzyme cut site, such 
attachment occurring at one end of said double stranded nucleic acid segment; 

b) digesting said double stranded nucleic acid segment with a restriction enzyme 
10 to produce a plurality of staggered double stranded molecules each having a 5' single 

stranded overhang sequence corresponding to said cut site; 

c) identifying a nucleotide n within a staggered double stranded molecule by 
template-directed polymerization with a labeled nucleotide or nucleotide terminator; 

d) providing an adaptor having a restriction enzyme recognition domain; 

15 e) ligating said adaptor to said double stranded nucleic acid to form a ligated 

molecule; 

f) repeating steps (b) through (c) to yield the identity of said nucleotide n + x in 
each of said staggered double stranded molecules having said single strand overhang 
sequence thereby sequencing an interval within said double stranded nucleic acid 
20 segment, 

wherein x is greater than one and no greater than the number of nucleotides 
between a recognition domain for a restriction enzyme and an enzyme cut site. 

88. The method of claim 87, wherein said enzyme cut site is the cut site located the 
25 farthest away from said recognition domain. 

89. The method of claim 87, wherein said restriction enzyme of step (b) is a class-IIS 
restriction endonuclease. 
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90. The method of claim 89, wherein said class-IIS restriction endonuclease is 
selected from the group consisting of AccBSI, Acelll, Acil, AclWI, Alwl, Alw26I, 
AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, AspSOHI, AsuHPI, Bad, 
Bbsl, Bbvl, BbvII, Bbvl6II, Bce83I, Bcefl, Bcgl, BcoSI, Bcol 161 BcoKI, BinI, BH736I, 

5 Bpil, Bpml, BpulOI, BpuAI, Bsal, BsaMI, Bsc9II, BscAI, BscCI, Bsell, Bse3DI, BseNI, 
BseRI, BseZI, Bsgl, Bsil, BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, 
BspIS4I, BspKTSI, BspLUl 1III, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, 
BsrDI, BsrSI, BssSI, Bstl II, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTSSI, 
Bsu6I, Cjel, CjePI, Eaml 1041, Earl, Eco31I, Eco57I, EcoA4I, Eco044I, Esp3I, Faul, 

10 Fokl, Gdill, Gsul, Hgal, HphI, Ksp632I, MboII, Mlyl, Mmel, Mnll, Mval269I, Phal, 
Piel, RleAI, Sapl, SfaNI, SimI, StsI, Taqll, TspII, TspRI, Tthl 1 HI, and VpaK32I. 

91 . The method of claim 87, wherein a nucleic acid ligase is used to attach at least 
one strand of said restriction enzyme recognition domain of step (d) to said nucleic acid 

1 5 segment. 

92. The method of claim 87, wherein said method further comprises blocking an 
enzyme recognition domain lying outside said enzyme recognition domain of step (d). 

20 93. The method of claim 92, wherein said method further comprises methylating an 
enzyme recognition domain lying outside said enzyme recognition domain of step (d). 

94. The method of claim 93, wherein said methylation occurs through in vitro 
reaction with a methylase that recognizes the enzyme recognition domain of step (d). 

25 

95. The method of claim 94, wherein said methylase is a Fokl methylase. 
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96. The method of claim 92, wherein said blocking occurs through an in vitro primer 
extension. 
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97. The method of claim 96, wherein said in vitro primer extension is DNA 
amplification in vitro. 

98. The method of claim 96, wherein said method further comprises hemi- 

5 mythylating an enzyme recognition domain lying outside said enzyme recognition 
domain of step (d). 

99. The method of claim 98, wherein said hemi-methylation occurs through an 
in vitro primer extension using a primer having a portion of said enzyme recognition 

1 0 domain that blocks enzyme recognition if it is hemi-methylated. 

100. The method of claim 99, wherein said primer extension occurs with a methylated 
nucleotide. 



15 101 . The method of claim 96, wherein said restriction endonuclease recognizes a 
hemi-methylated recognition domain, and the primer contains at least one methylated 
nucleotide in a methylated portion of said recognition domain. 

102. The method of claim 87, wherein said nucleic acid segment is a genomic DNA. 

20 

103. The method of claim 87, wherein said nucleic acid segment is a cDNA. 

104. The method of claim 87, wherein said nucleic acid segment is a product of an 
in vitro DNA amplification. 

25 

1 05. The method of claim 87, wherein said nucleic acid segment is a PCR product. 
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106. The method of claim 87, wherein said nucleic acid segment is a product of a 
strand displacement amplification. 

107. The method of claim 87, wherein said nucleic acid segment is a vector insert 
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108. The method of claim 87, wherein said detectable label is selected from the group 
consisting of one or more fluorescent, near infra-red, radionucleotide and 
chemilluminescent labels. 

5 109. The method of claim 87, wherein said nucleic acid segment is attached to a solid 
matrix. 

1 1 0. The method of claim 1 09, wherein said solid matrix is a magnetic streptavidin. 

10 111. The method of claim 1 09, wherein said solid matrix is a magnetic glass particle. 

1 12. The method of claim 87, wherein said adaptor of step (d) is attached to a solid 
matrix. 

15 113. The method of claim 112, wherein said solid matrix is a magnetic streptavidin. 

1 14. The method of claim 1 12, wherein said solid matrix is a magnetic glass particle. 

115. The method of claim 87, wherein said step (b) is modified to generate a blunt end 
20 in said nucleic acid segment. 

1 1 6. The method of claim 1 1 5, wherein said step (c) is modified to identify a 
nucleotide in said blunt end of said nucleic acid segment by using a 3' exonuclease 
activity of a DNA polymerase to generate a single nucleotide long single-stranded 

25 nucleic acid template. 

1 1 7. The method of claim 1 1 6, said method further comprising sequencing said 
nucleotide by a template-directed polymerization with a labeled nucleotide or nucleotide 
terminator. 
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118. The method of claim 1 1 7, wherein said template-directed polymerization is 
followed by identification of an incorporated label. 

119. A method for removing all or a part of a primer sequence from a primer extended 
product, comprising: 

a) providing a primer sequence encoding a methylated portion of a restriction 
endonuclease recognition domain, wherein recognition of said domain by a restriction 
endonuclease requires at least one methylated nucleotide; 

b) polymerizing by a template-directed primer extension using said primer and a 
nucleic acid segment to generate a primer extended product; and 

c) digesting said primer extended product with a restriction endonuclease that 
recognizes the resulting double-stranded restriction endonuclease recognition domain 
encoded by said primer sequence in said primer extended product. 

120. The method of claim 1 1 9, wherein a sequence complimentary to said primer 
sequence is also removed by said restriction endonuclease digestion in said step (c). 

121 . The method of claim 119, wherein said restriction endonuclease of step (c) is a 
class-IIS restriction endonuclease. 

122. The method of claim 121 , wherein said digestion with said class IIS restriction 
endonuclease of step (c) generates a single-strand extension no longer than 10 
nucleotides in length that is not encoded by said primer encoding at least part of said 
restriction endonuclease recognition domain. 

123. The method of claim 1 1 9, wherein said template-directed primer extension in 
said step (b) occurs during nucleic acid amplification in vitro. 

124. The method of claim 123, wherein said nucleic acid amplification in vitro is 
linear. 
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125. The method of claim 123, wherein said nucleic acid amplification in vitro is 
exponential. 

126. The method of claim 125, wherein said nucleic acid amplification in vitro is 
5 PCR. 

127. The method of claim 125, wherein said nucleic acid amplification in vitro is 
strand displacement amplification. 

10 128. A method for blocking a restriction endonuclease recognition domain in a primer 
extended product, comprising: 

a) providing a primer with at least one modified nucleotide, wherein said 
modified nucleotide blocks an enzyme recognition domain, and at least a portion of said 
enzyme recognition domain sequence is encoded in said primer. 
15 b) polymerizing by a template-directed primer extension using said primer and a 

nucleic acid segment to generate a primer extended product; and 

c) digesting said primer extended product with an enzyme that recognizes a 
double-stranded enzyme recognition domain in said primer extended product. 

20 129. The method of claim 128, wherein said modified nucleotide is a methylated 
nucleotide. 

130. The method of claim 128, wherein said template directed primer extension in 
said step (b) occurs during nucleic acid amplification in vitro. 

25 

131. The method of claim 130, wherein said amplification in vitro is linear. 

132. The method of claim 130, wherein said amplification in vitro is exponential. 
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133. The method of claim 132, wherein said amplification in vitro is PCR. 
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134. The method of claim 132, wherein said amplification in vitro is strand 
displacement amplification. 

135. The method of claim 128, wherein said nucleic acid template is part of a 
construct consisting of an insert in a vector. 

« 

136. A method for automated sequencing of double-stranded DNA segments with 
nested single strand overhang templates, such method comprising the steps of 

i) providing a support array having a plurality of sample holders arrayed in a 
matrix of positions on the support 

ii) immobilizing a plurality of double-stranded DNA segments at respective 
sample holders of said array, each DNA segment having an end comprising a single- 
strand overhang template sequence no long than about twenty nucleotides in length 

iii) simultaneously treating all sample holders with one or more reagents which 
selectively react with at least one nucleotide of said single-strand overhang template to 
effectively label the material at each holder 

iv) reading said array by automated scan detection to thereby determine at least 
one nucleotide of said single-strand overhang template, and 

v) reducing length of each strand of said DNA segment at each holder by a fixed 
number n > 1 at said overhang end to produce a homologously ordered array of shorter 
and nested DNA segments, each with a single-strand overhang template sequence, and 
further performing steps iii) and iv) to determine at least one nucleotide at each single- 
strand overhang sequence, wherein the steps of treating, reading and reducing the length 
of the strands of the DNA segment at each holder by a number of n > 1 nucleotides are 
iteratively performed as automated process steps to produce nested and progressively 
shorter DNA segments and to sequence the plurality of DNA segments immobilized at 
the array of sample holders in situ. 

137. The method of claim 136, wherein said array is a chip or a microtiter support 
array. 
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138. The method of claim 136, wherein the array is on a stage. 

139. The method of claim 138, wherein said stage is rotatable for spinning to cause 
fluid provided at a central position thereof to flow across the array by centrifugal flow, 

5 and wherein the step of treating with one or more reagents includes flowing a reagent 
through said array to alter material immobilized in the sample holders. 

140. The method of claim 138, wherein said stage includes heat cycling means for 
cyclically heating the support array, and the step of treating includes treating at least a 

1 0 portion of material at each sample holder with a primer and operating the heat cycling 
means to regenerate material at the respective sample holders. 

141 . The method of claim 136, wherein step i) is preceded by treating each initial 
DNA segment to produce a set of n DNA segments with respective nested single-strand 

1 5 templates, and thereafter reducing the length of each template in intervals of n 

nucleotides so that the nested sequences from said n templates provides a continuous 
sequence for said initial DNA segment, thereby increasing the length of continuous 
DNA sequenced for a given number of steps. 

20 142. The method of claim 136, wherein the step of reducing length to produce a 
homologously ordered array of DNA segments includes the steps of transferring an 
aliquot of material from each sample holder to a corresponding sample holder on a 
separate support array, and enzymatically removing a fixed length of > one nucleotide 
from each DNA strand. 

25 

143. The method of claim 141, wherein the step of treating each initial DNA segment 
to produce a set of n DNA segments with respective nested single-strand templates 
includes the steps of transferring an aliquot of material from each sample holder to a 
corresponding sample holder on a separate support array. 

30 
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144. The method of claim 139, wherein the step of reducing the length of each stand 
by n nucleotides reduces by n < 60 nucleotides, and said automated process steps are 
performed by arranging around a circumference on said stage m support arrays A], 
A2..Am, each of said m support arrays communicating at a radially inner point with one 

5 fluid support channel of a set of m fluid supply channels Ci, C2.C m > such that all 
sample holders of an array are treated with a flow of a common reagent, 

145. The method of claim 144, wherein m > n, and arranging that each array Aj 
receives reagents along channel Cj to form an overhang at position i with respect to the 

1 0 . original DN A segment, whereby each sample is sequenced in steps of > 1 and < n 
nucleotides and the m arrays span the full sequence of nucleotides over a continuous 
span of each double-stranded DNA segment. 

146. The'method of claim 144, wherein said m fluid supply channels are provided 
1 5 with reagents effective to label the templates in array A] , A2-.A m , and the step of 

reading m successive nucleotides by scanning the corresponding sample holders on each 
of the m support arrays after reducing said length. 

147. The method of claim 1 36, wherein the step of immobilizing a plurality of DNA 
20 segments at respective sample holders of an array includes immobilizing a plurality of 

DNA segments and creating a single strand overhang template on each immobilized 
DNA segment in situ. 

148. The method of claim 147, wherein the single strand overhang sequence is created 
25 by a process including ligation of a strand of a recognition domain to each template and 

digestion by an enzyme that cuts at a site at least one nucleotide away from the 
recognition domain. 
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149. The method of claim 1 48, wherein said enzyme is a class-IIS restriction 
endonuclease. 
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1 50. The method of claim 149, wherein ligation of a recognition domain strand 
includes ligation of a DNA sequence that can be used to generate a primer annealing site 
during DNA amplification in vitro following ligation of the recognition domain and 
prior to generation of the DNA template. 

5 

151. The method of claim 1 50, wherein DNA amplification in vitro occurs through 
PCR. 

152. The method of claim 150, further comprising the step of separating an aliquot 

1 0 from each sample holder of the array to a further sample holder and amplifying material 
of the aliquot by DNA amplification in vitro. 

153. The method of claim 152, wherein the step of separating an aliquot includes 
immobilizing the aliquot on a hedgehog comb. 

15 

1 54. The method of claim 151, further comprising the step of retaining an aliquot in 
each sample holder of the array and amplifying material of the aliquot by DNA 
amplification in vitro. 

20 155. The method of claim 1 50, wherein the method of DNA amplification is of low 
magnitude by making the DNA templates relatively inaccessible to primer annealing. 

1 56. The method of claim 1 55, wherein DNA templates are made relatively 
inaccessible to primer annealing through immobilization. 

25 

157. The method of claim 1 50, further including the step of methylating sites of the 
segments outside the ligated recognition domain strand. 
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158. A method for automated sequencing of double stranded DNA segments, such 
method being characterized by steps of 

attaching a recognition domain to each segment to form a set of DNA segments 
having the recognition domain nested at an interval no greater than the distance between 
5 the recognition domain and its cut site for a given enzyme that recognizes said 
recognition domain 

treating the DNA segments with an enzyme that recognizes said attached 
recognition domain, and cuts each strand of each DNA segment to create an overhang 
template at a distance of > 1 nucleotide along the DNA segment from said recognition 
10 domain, and thereby generating a set of nested overhang templates. 

determining at least one nucleotide of each of said nested overhang templates, 
and thereafter 

reducing length of each strand at the end of the DNA segment with the overhang 
template by > 1 nucleotide to produce a corresponding set of shorter DNA segments 
1 5 each with an overhang template, said step of reducing being performed by removing a 
block of nucleotides, whereby each shorter DNA segment with an overhang template is a 
known subinterval of a previous DNA segment with overhang. 



WO 99/45153 PCT/US99/04883 

- 115- 

1 59. A method for automated sequencing of double-stranded DNA segments, such 
method comprising the steps of 

i) providing a support array having a plurality of sample holders arrayed in a 
matrix of positions on the support 
5 ii) immobilizing a plurality of double-stranded DNA segments at respective 

sample holders of said array, each DNA segment having an end comprising a single- 
strand overhang template sequence no long than about twenty nucleotides in length 

iii) simultaneously treating all sample holders with one or more reagents which 
selectively react with at least one nucleotide of said single-strand overhang template to 

10 effectively label the material at each holder 

iv) reading said array by automated scan detection to thereby determine at least 
one nucleotide of said single-strand overhang template 

v) regenerating material at the respective sample holders by DNA amplification 

in vitro 

1 5 vi) reducing length of each strand of said DNA segment at each holder by a fixed 

number n > 1 at said overhang end to produce a homologously ordered array of trimmed 
DNA segments, each with a single-strand overhang template sequence, and further 
performing step iii) to determine at least one nucleotide at each single-strand overhang 
sequence, wherein the steps of treating, reading, reducing lengths and product 

20 regeneration are iteratively performed as automated process steps to produce 

progressively trimmed DNA segments and to sequence the plurality of DNA segments 
immobilized at the array of sample holders in situ. 

160. The method of claim 159, wherein said array is a chip or a microtiter support 
25 array. 



161 . The method of claim 1 59, wherein the array is on a stage. 
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1 62. The method of claim 161 , wherein said stage is rotatable for spinning to cause 
fluid provided at a central position thereof to flow across the array by centrifugal flow, 
and wherein the step of treating with one or more reagents includes flowing a reagent 
through said array to alter material immobilized in the sample holders. 

5 

163. The method of claim 161 , wherein said stage includes heat cycling means for 
cyclically heating the support array, and the step of treating includes treating at least a 
portion of material at each sample holder with a primer and operating the heat cycling 
means to regenerate material at the respective sample holders. 

10 

164. The method of claim 159, wherein n > 1, and step i) is preceded by treating each 
initial DNA segment to produce a set of n DNA segments with respective nested single- 
strand templates, and thereafter reducing the length of each template in intervals of n 
nucleotides so that the nested sequences from said n templates provides a continuous 

1 5 sequence for said initial DNA segment, thereby increasing the length of continuous 
DNA sequenced for a given number of steps. 

165. The method of claim 1 59, wherein the step of reducing length to produce a 
homologously ordered array of DNA segments includes the steps of transferring an 

20 aliquot of material from each sample holder to a corresponding sample holder on a 
separate support array. 

166. The method of claim 164, wherein the step of treating each initial DNA segment 
to produce a set of n DNA segments with respective nested single-strand templates 

25 includes the steps of transferring an aliquot of material from each sample holder to a 
corresponding sample holder on a separate support array. 
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1 67. The method of claim 1 62, wherein the step of reducing the length of each stand 
by n nucleotides reduces by n < 60 nucleotides, and said automated process steps are 
performed by arranging around a circumference on said stage m support arrays A\ 9 
A2«Am, each of said m support arrays communicating at a radially inner point with one 

5 fluid support channel of a set of m fluid supply channels Ci, C2..C m , such that all 
sample holders of an array are treated with a flow of a common reagent. 

168. The method of claim 167, wherein m > n, and arranging that each array Aj 
receives reagents along channel Ci to form an overhang at position i with respect to the 

10 original DNA segment, whereby each sample is sequenced in steps of >1 and < n 
nucleotides and the m arrays span the full sequence of nucleotides over a continuous 
span of each double-stranded DNA segment. 

169. The method of claim 167, wherein said m fluid supply channels are provided 
15 with reagents effective to label the templates in array Ai , A2..Am, and the step of 

reading m successive nucleotides by scanning the corresponding sample holders on each 
of the m support arrays after reducing said length. 

1 70. The method of claim 1 59, wherein the step of immobilizing a plurality of DNA 
20 segments at respective sample holders of an array includes immobilizing a plurality of 

DNA segments and creating a single strand overhang template on each immobilized 

■ 

DNA segment in situ. 

171 . The method of claim 1 70, wherein the single strand overhang sequence is created 
25 by a process including ligation of a recognition domain strand to each template and 

digestion by an enzyme that cuts at a site at least one nucleotide away from the 
recognition domain. 
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172. The method of claim 171, wherein said enzyme is a class-IIS restriction 
endonuclease. 
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1 73. The method of claim 1 72, wherein ligation of a strand of a recognition domain 
includes ligation of a DNA sequence that can be used to generate a primer annealing site 
during DNA amplification in vitro following ligation of the recognition domain and 
prior to generation of the DNA template. 

1 74. The method of claim 173, wherein DNA amplification m vitro occurs through 
PCR. 

175. The method of claim 173, further comprising the step of separating an aliquot 
from each sample holder of the array to a further sample holder and amplifying material 
of the aliquot by DNA amplification in vitro. 

176. The method of claim 175, wherein the step of separating an aliquot includes 
immobilizing the aliquot on a hedgehog comb. 

177. The method of claim 173, further comprising the step of retaining an aliquot in 
each sample holder of the array and amplifying material of the aliquot by DNA 
amplification in vitro. 

178. The method of claim 1 73, wherein the method of DNA amplification is of low 
magnitude by making the DNA templates relatively inaccessible to primer annealing. 

179. The method of claim 1 78, wherein DNA templates are made relatively 
inaccessible to primer annealing through immobilization. 

180. The method of claim 1 73, further including the step of methylating sites of the 
segments outside the ligated recognition domain strand. 
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181. A method for automated sequencing of double stranded DN A segments, such 
method being characterized by steps of 

attaching a recognition domain to each segment to form DNA segments having 

the recognition domain. 

regenerating the template precursor by DNA amplification in vitro 
treating the DNA segments with an enzyme that recognizes said attached 

recognition domain, and cuts each strand of each DNA segment to create an overhang 

template at a distance of > 1 nucleotide along the DNA segment from said recognition 

domain 

determining at least one nucleotide of said overhang template, and thereafter 
reducing length of each strand at the end of the DNA segment with the overhang 
template by > 1 nucleotide to produce a corresponding set of trimmed DNA segments 
each with an overhang template, said step of reducing being performed by removing a 
block of nucleotides, whereby each trimmed DNA segment with an overhang template 
is a known subinterval of a previous DNA segment with overhang. 

182. A method for identifying a first nucleotide n and a second nucleotide n + x in a 
double stranded nucleic acid segment, comprising: 

a) digesting said double stranded nucleic acid segment with a restriction enzyme 
to produce a double stranded molecule having a single stranded overhang sequence 
corresponding to an enzyme cut site; 

b) providing an adaptor having a cycle identification tag, a restriction enzyme 
recognition domain and a sequence identification region; 

c) hybridizing said adaptor to said double stranded nucleic acid having said 
single-stranded overhang sequence to form a ligated molecule; 

d) amplifying said ligated molecule from step (c) with a labeled primer specific 
for said cycle identification tag, restriction enzyme recognition domain, and a portion of 
said sequence identification region of said adaptor; 

e) identifying said nucleotide n by identifying said primer incorporated into the 
amplification product; and 
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f) repeating steps (a) through (e) on said amplified molecule from step (e) to 
yield the identity of said nucleotide n + x, wherein x is less than or equal to the number 
of nucleotides between a recognition domain for a restriction enzyme and an enzyme cut 
site. 

1 83. A method for identifying a first nucleotide n and a second nucleotide n + x in a 
double stranded nucleic acid segment, comprising: 

a) digesting said double stranded nucleic acid segment with a restriction enzyme, 
resulting in a trimmed end in said double stranded molecule; 

b) providing an adaptor having a cycle identification tag and a restriction enzyme 
recognition domain; 

c) ligating said adaptor to the trimmed end of said double stranded nucleic acid to 
form a ligated molecule; 

d) amplifying said ligated molecule from step (c) with a labeled primer specific 
for said cycle identification tag and said restriction enzyme recognition domain of the 
adaptor, and for a nucleotide in said trimmed end in said double stranded molecule; 

e) identifying said nucleotide n by identifying said primer incorporated into the 
amplification product; and 

f) repeating steps (a) through (e) on said amplified molecule from step (e) to 
yield the identity of said nucleotide n + x, wherein x is less than or equal to the number 
of nucleotides between a recognition domain for a restriction enzyme and an enzyme cut 
site. 

1 84. A method for removing all or part of a primer sequence from a primer extended 
product, comprising: 

a) providing a primer sequence encoding a portion of a restriction endonuclease 
recognition domain; 

b) polymerizing by a template-directed primer extension using said primer, a 
methylated nucleotide, and a nucleic acid segment to generate a primer extended product 
during nucleic acid amplification in vitro, wherein the non-methylated nucleotide 
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corresponding to the methylated nucleotide is contained within said portion of the 
recognition domain sequence in said primer sequence; and 

c) digesting said primer extended product with a restriction endonuclease that 
recognizes the resulting hemi-methylated double-stranded restriction endonuclease 
5 recognition domain encoded by said primer sequence in said primer extended product, 
and does not recognize the double-methylated products resulting from said nucleic acid 
amplification in vitro. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

5 (i) APPLICANT: 

(A) NAME: THE UNIVERSITY OF IOWA RESEARCH FOUNDATION 

(B) STREET: 214 TECHNOLOGY INNOVATION CENTER, 
OAKDALE RESEARCH CAMPUS 

(C) CITY: IOWA CITY 
10 (D) STATE: IOWA 

(E) COUNTRY: US 

(F) POSTAL CODE: 52319 

(G) TELEPHONE: 

(H) TELEFAX: 

15 

(ii) TITLE OF INVENTION: AN ITERATIVE AND REGENERATIVE 

DNA SEQUENCE METHOD 

(iii) NUMBER OF SEQUENCES: 41 

20 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: LAHIVE & COCKFIELD , LLP 

(B) STREET: 28 STATE STREET 

(C) CITY: BOSTON 

25 (D) STATE: MASSACHUSETTS 

(E) COUNTRY: US 

(F) ZIP: 02109 

(v) COMPUTER READABLE FORM: 
30 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

35 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US99/ 

(B) FILING DATE: 04 MARCH 1999 

(C) CLASSIFICATION: 

40 (vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/035,183 

(B) FILING DATE: 05 MARCH 1998 

(viii) ATTORNEY/AGENT INFORMATION: 
45 (A) NAME: HANLEY, ELIZABETH A. 

(B) REGISTRATION NUMBER: 33,505 

(C) REFERENCE /DOCKET NUMBER: UIZ-022CPPC 



(ix) TELECOMMUNICATION INFORMATION: 
50 (A) TELEPHONE: (617)227-7400 

(B) TELEFAX: (617) 742-4214 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS; 
(A) LENGTH: 23 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:X:« 
CNNNCATCCG ACCCAGGCGT GCG 23 
15 (2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ANNNCATCCG ACCCAGGCGT GCG 23 
(2) INFORMATION FOR SEQ ID NO: 3: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

40 

TNNNCATCCG ACCCAGGCGT GCG 23 
(2) INFORMATION FOR SEQ ID NO:4: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 



55 GNNNCATCCG ACCCAGGCGT GCG 



23 
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(2) INFORMATION FOR SEQ ID NO; 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 
{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CGCACGCCTG GGTCGGATG 19 
15 (2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

CNNNCATCCT CTGGGCTGCA CGGG 24 
(2) INFORMATION FOR SEQ ID NO: 7: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

40 

ANNNCATCCT CTGGGCTGCA CGGG 24 
(2) INFORMATION FOR SEQ ID NO: 8: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



55 TNNNCATCCT CTGGGCTGCA CGGG 



24 



WO 99/45153 



PCT/US99/04883 



10 



-4- 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GNNNCATCCT CTGGGCTGCA CGGG 24 
15 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCCGTGCAGC CCAGAGGATG 20 
(2) INFORMATION FOR SEQ ID NO: 11: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GTTTTCCTGG ATGATGCCCT GGC 23 
(2) INFORMATION FOR SEQ ID NO: 12: 

45 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
55 CATGCTTTGA TGACGCTTCT GTATC 25 



40 



WO 99/45153 



-5- 



(2) INFORMATION FOR SEQ ID NO: 13: 



PCT/US99/04883 



10 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 



CNNNCATCCG ACCCAGGCGT GCG 



23 



15 (2) INFORMATION FOR SEQ ID NO: 14: 



20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 



ANNNCATCCG ACCCAGGCGT GCG 



23 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



TNNNCATCCG ACCCAGGCGT GCG 



23 



45 



50 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



55 GNNNCATCCG ACCCAGGCGT GCG 



23 



WO 99/45153 PCT/US99/04883 

-6- 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 : 
CGCACGCCTG GGTCGGATG 19 
15 (2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CNNNCATCCT CTGGGCTGCA CGGG 24 
(2) INFORMATION FOR SEQ ID NO: 19: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

40 

ANNNCATCCT CTGGGCTGCA CGGG 24 
(2) INFORMATION FOR SEQ ID NO: 20: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



55 TNNNCATCCT CTGGGCTGCA CGGG 



24 



WO 99/45153 



PCT/US99/04883 



10 



-7- 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GNNNCATCCT CTGGGCTGCA CGGG 24 
15 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

CCCGTGCAGC CCAGAGGATG 20 
(2) INFORMATION FOR SEQ ID NO: 23: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

40 

CGCACGGCTG GGTCGGAGGA GNC 23 
(2) INFORMATION FOR SEQ ID NO: 24: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
55 CGCACGGCTG GGTCGGAGGA GNA 23 



WO 99/45153 PCT/US99/04883 

-8- 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 23 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CGCACGGCTG GGTCGGAGGA GNT 23 
15 (2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

CGCACGGCTG GGTCGGAGGA GNG 23 
(2) INFORMATION FOR SEQ ID NO: 27: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

40 

CTCCTCCGAC CCAGCCGTGC G 21 
(2) INFORMATION FOR SEQ ID NO:28: 

45 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



55 GGTGCGCCAG TCCAGCGAGG AGNC 



24 



WO 99/45153 PCT/US99/04883 

-9- 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
GGTGCGCCAG TCCAGCGAGG AGNA 24 
15 (2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GGTGCGCCAG TCCAGCGAGG AGNT 24 
(2) INFORMATION FOR SEQ ID NO: 31: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

40 

GGTGCGCCAG TCCAGCGAGG AGNG 24 
(2) INFORMATION FOR SEQ ID NO: 32: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
55 CTCCTCGCTG GACTGGCGCA CC 



22 



WO 99/45153 PCT/US99/04883 

- 10- 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 35 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:, 
TCTGTTCTCA GTTTTCCTGG ATGAGGAGTG GCACC 35 
15 (2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

CGCACGGCTG GGTCGGAGGA G 21 
(2) INFORMATION FOR SEQ ID NO: 35: 

30 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

40 

GGTGCGCCAG TCCAGCGAGG AG 22 
(2) INFORMATION FOR SEQ ID NO: 36: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



55 NNNCATCCGA CCCAGGCGTG CG 



22 



WO 99/45153 



- 11 - 



(2) INFORMATION FOR SEQ ID NO: 37: 



PCT/US99/04883 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



CGCACGCCTG GGTCGGATG 



19 



15 (2) INFORMATION FOR SEQ ID NO: 38: 



20 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 



NNNCATCCTC TGGGCTGCAC GGG 



23 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 



CCCGTGCAGC CCAGAGGATG 



20 



45 



50 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



55 CCATCCGTAA GATGATCTTC TG 



22 



WO 99/45153 
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- 12- 

(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 



CTCAGAATGA CTTGGTTG 
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